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SECTION  1 


INTRODUCTION 

The  Goals  of  Computer  Language  Design 

The  universe  and  its  reflection  in  the  ideas  of  man  have  wonderfully 
complex  structures.  Our  ability  to  comprehend  this  complexity  and  perceive 
an  underlying  simplicity  is  intimately  bound  with  our  ability  to  symbolize 
and  communicate  our  experience.  The  scientist  has  been  free  to  extend  and 
invent  language  whenever  old  forms  became  unwieldy  or  inadequate  to  ex¬ 
press  his  ideas.  His  readers  however  have  faced  the  double  task  of  learning 
his  nev  language  and  the  new  structures  he  described.  There  has  therefore 
arisen  a  natural  control:  a  work  of  elaborate  linguistic  inventiveness  and 
meager  results  will  not  be  widely  read. 

As  the  computer  scientist  represents  and  manipulates  information 
within  a  machine,  he  is  simulating  to  some  extent  his  own  mental  processes. 
He  must,  if  he  is  to  make  substantial  progress,  have  linguistic  constructs 
capable  of  communicating  arbitrarily  complicated  information  structures 
and  processes  to  his  machine.  One  might  expect  the  balance  between  linguis¬ 
tic  elaboration  and  achieved  results  to  be  operable.  Unfortunately,  the 
computer  scientist,  before  he  can  obtain  his  results,  must  successfully 
teach  his  language  to  one  particularly  recalcitrant  reader:  the  computer 
itself.  This  teaching  task,  called  compiler  writing,  has  been  formidable. 

Consequently,  the  computing  community  has  assembled,  under  the 
banner  of  standarization,  a  considerable  movement  for  the  acceptance  of 
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a  few  committee-defined  languages  for  the  statement  of  all  computer 
processes.  The  twin  ideals  of  a  common  language  for  programmers  and 
the  immediate  interchangibility  of  programs  among  machines  have  largely- 
failed  to  materialize.  The  main  reason  for  the  failure  is  that  program¬ 
mers,  like  all  scientists  before  them,  have  never  been  wholly  satisfied 
with  their  heritage  of  linguistic  constructs.  We  hold  that  the  demand  for 
a  fixed  standard  programming  language  is  the  antithesis  of  a  desire  for 
progress  in  computer  science.  That  the  major  responsibility  for  computer 
language  design  should  rest  with  the  language  user  will  be  our  central 
theme . 

The  reduction  of  compiler  writing  to  a  task  that  a  language  user 
might  reasonably  wish  to  undertake  is  the  major  technical  obstacle.  We 
are  not  alone  in  our  desire  to  simplify  compiler  writing  [4,  7 >  17*  22,  25 
and  we  must  justify  our  particular  approach  in  some  detail. 

We  postulate  the  existence  of  a  set  of  basic  concepts  common  to  all 
computing  tasks.  A  language  which  includes  just  the  basic  concepts  we 
will  call  a  kernel  language .  The  implementation  of  a  compiler  for  a 
kernel  language  we  will  call  an  extendable  compiler .  We  do  not  expect 
agreement  on  what  constitutes  the  set  of  basic  concepts  or  on  the  best 
kernel  language  to  represent  them.  We  do  hope  that  our  kernel  language 
will  be  noncontroversial  enough  that  the  user  will  not  be  seriously 
hampered  in  building  a  language  to  suit  his  needs. 

Our  first  claim  is  that  modifying  an  extendable  compiler  is  easier 
than  building  a  compiler  from  first  principles.  The  primary  reason  for 
this  is  that  the  user  of  an  extendable  compiler  can  largely  ignore  the 
details  of  such  mechanisms  as  text  scanning,  syntactic  analysis  and  pro¬ 
gram  loading  while  concentrating  on  translating  his  forms  (syntax)  into 


his  meaning  (semantics).  In  many  compiler  systems  the  mechanisms  for 
syntactic  and  semantic  analysis,  scanning,  building  tables  and  code 
production  are  inextricably  entwined,  making  a  change  to  any  one  of  them 
hazardous,  even  for  the  expert.  In  our  extendable  compiler  such  functions 
are  cleanly  separated,  both  conceptually  and  physically  in  the  text  of 
the  compiler  program. 

Our  second  claim  involves  the  syntactic  description  of  the  user’s 
language.  We  demand  a  phrase  structure  grammar  (BNF,  Backus-Naur  Form, 
Chomsky  type  II,  context  free,  etc.)  from  which  a  syntax  preprocessor 
generates  syntactic  recognition  tables  for  physical  insertion  into  the 
compiler.  We  can  show  that  if  the  syntax  preprocessor  accepts  the  phrase 
structure  grammar  without  complaint,  then  the  syntactic  analyzer  in  the 
compiler  will  always  function  correctly.  In  short,  we  can  prevent  even 
the  naive  user  from  blundering  into  an  ambiguous  or  otherwise  ill-defined 
grammar . 

Finally,  we  claim  that  the  kernel  language  is  a  powerful  and  concise 
base  upon  which  to  build. 

Review  of  the  Literature  and  Summary 

We  assume  (for  the  moment)  the  reader  is  familiar  with  the  notion 
of  a  context-free  grammar.  The  central  problem  in  writing  a  compiler  for 
a  language  described  by  a  context-free  grammar  is  the  construction  of  an 
algorithm  which  will  efficiently  discover  the  grammatical  structure  of 
an  arbitrary  input  text.  And  the  basic  step  in  a  parsing  algorithm  is 
the  identification  of  a  substring  in  the  text  which,  when  replaced  by 
application  of  a  rewriting  rule,  brings  us  closer  to  goal  of  an  analyzed 


A  string  is  a  candidate  for  rewriting  if  it  is  identical  to  the 


right-hand  side  of  a  rewriting  rule.  If  two  or  more  candidates  for 
rewriting  overlap,  then  at  most  one  of  the  rewritings  can  lead  to  a 
correct  analysis.  In  Bounded  Context  Syntactic  Analysis  Floyd  explores 
the  possibility  of  making  the  decision  by  examining  a  fixed  number  of 
characters  to  the  left  and  right  of  the  candidate.  A  grammar  for  which 
such  a  decision  is  always  possible  is  called  of  bounded  context.  Floyd 
shows  that,  if  we  chose  the  left  and  right  bounds,  we  can  determine  if 
a  given  grammar  is  of  bounded  context,  for  the  chosen  bounds.  The 
construction  of  a  parsing  algorithm  then  simply  demands  the  construction 
of  tables  for  the  relevant  contexts. 

We  immediately  discover  two  difficulties.  First,  straightforward 
application  of  the  ideas  for  a  practical  language  results  in  tables  of 
impractical  size.  Floyd  points  out  several  simplifications  based  on 
particular  algorithms  (such  as  a  left-to-right  scan  of  the  text).  But 
the  main  difficulty  is  that  the  amount  of  table  required  for  the  hardest 
decision  is  required  for  all  decisions.  Second,  there  are  three  decisions 
involved:  where  is  the  left  end  of  the  candidate,  where  is  the  right  end, 
and  what  may  we  substitute  for  it.  As  might  be  expected,  the  bounds  for 
the  individual  decisions  are  usually  smaller  than  those  of  Floyd,  resulting 
in  a  reauction  of  the  table  size. 

In  Syntactic  Analysis  and  Operator  Precedence  Floyd  presents  a 
particular  algorithm  for  making  the  parsing  decisions.  The  algorithm  is 
not  properly  a  parsing  algorithm  since  it  skips  some  steps  in  the  analysis 
thus  failing  to  give  the  complete  structure  of  the  text  under  consideration. 
It  is  on  the  other  hand  more  efficient  for  skipping  them.  The  compiler 
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writer  must  in  each  case  decide  whether  the  analysis  provided  is  suffi¬ 
ciently  complete.  We  also  come  immediately  to  face  the  problem  that  for 
some  purposes  the  class  of  grammars  acceptable  to  the  algorithm  is  too 
restricted. 

In  Euler:  A  Generalization  of  Algol  60,  and  its  Formal  Definition, 

Wirth  and  Weber  modify  Floyd's  algorithm  to  remove  some  of  the  restrictive- 
ness  on  the  acceptable  grammer  and  also  expand  it  into  a  proper  analysis 
algorithm.  No  progress  is  made  in  reducing  the  size  of  the  tables  demanded 
by  expanding  the  context. 

In  this  paper  we  explore  the  implications  of  splitting  the  parsing 
decision  into  its  threr  components.  For  context  bounds  of  (l,l)  the 
allowed  grammars  turn  out  to  be  identical  to  those  of  Wirth  and  Weber. 

For  bounds  of  (2,l)  for  finding  the  left  boundary,  bounds  of  (l,2)  for 
finding  the  right  boundary  and  (0,0)  for  choosing  the  result  of  the 
rewriting  we  find  a  substantial  improvement  in  the  table  size  but  they 
are  still  impractically  large. 

Also  in  Euler  ...  we  find  that  not  only  the  form  of  the  language 
but  also  the  sequence  of  parsing  steps  is  significant  in  the  design  of 
a  compiler.  The  sequence  of  steps  proceeding  strictly  from  left  to  right 
in  the  text  is  called  the  canonical  parse.  The  canonical  parse  turns  out  to 
be  a  natural  vehicle  for  describing  the  sequence  of  execution  in  the 
compiled  program  as  well  as  for  proving  a  given  class  of  grammars  unambiguous. 

In  language  design  we  attempt  two  goals:  to  present  a  language 
simpler  and  more  powerful  than  Euler,  and  to  make  the  defining  mechanism 
sufficiently  simple  so  that  the  language  user  can  change  the  language  to 


Our  first  action  is  to  equate  those  constructs  in  other  languages 
that  are  conceptually  similiar  but  take  different  forms  (switches,  proce¬ 
dures  and  name  parameters)  (lists,  blocks,  compound  statements,  parameter 
lists,  iteration  lists).  Our  second  step  is  to  integrate  the  concept 
of  a  list-valued  constant  into  the  language  structure  itself. 

We  describe  the  resulting  language  and  compiler  in  some  detail. 
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SECTION  2 


COMPUTER  LANGUAGE  DEFINITION 

Production  Grammars 

As  can  be  seen  by  examining  Table  1,  there  is  little  unanimity 
among  authors  regarding  the  formalisms  for  the  description  of  production 
grammars.  While  our  notation  adheres  closely  to  the  consensus,  our 
readers  may  wish  to  refer  to  the  table  for  a  more  familiar  terminology. 

We  define  three  primitive  entities:  (l)  the  vocabulary  V,  a  finite 
set  of  elements  called  symbols ,  (2)  a  null  string  of  symbols,  A  and 
(3)  the  operation  of  catenation  between  strings  and  symbols  denoted  by 
juxtaposition.  In  terms  of  the  primitive  entities  we  make  the  following 
definitions: 

V*  =  { x  |x  =  A  or  (3y)(3X),  y  €  V*  ,  Y  €  V,  x  =  yY] 

is  the  set  of  all  strings  that  can  be  formed  from  the  elements  of  set  V. 
Note  that  we  have  used  lower  case  latin  letters  to  denote  members  of  V* 
and  upper  case  latin  letters  to  denote  members  of  V.  This  convention 
is  extremely  useful  and  we  will  adhere  to  it  henceforth,  usually  without 
explicit  reminder. 
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Author 
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Chomsky 

131 

1 

1  s 

v« 

E 

e. 

B 

15,6] 

Eickel  A  Paul 

D 

z 

m 

B 

B 

B 

t 

m 

TT 

Ginsburg 

til] 

n 

•• 

E 

e(v) 

e 

B 

=> 

# 

Greibach 

[12] 

I  UT 

X 

B 

1 

B 

B 

B 

B 

l? 

Floyd 

[71 

n 

s 

B 

NTC 

»V 

B 

B 

B 

=> 

P 

Knuth 

[163 

I  U  T 

s 

B 

I 

(IUT)* 

€ 

B 

=> 

b 

[25] 

Wirth  &  Weber 

■ 

S> 

B 

fl 

B 

* 

McKeeman 

n 

G 

B 

VK 

V* 

B 

B 

B 

=> 

B 

p 

Table  1 

A  resume  of  notations  used  in  recent  papers  on 
production  grammars. 


The  arrows  of  Eickel  and  Paul,  like  those  of  Gilbert  [ 10l  have  the 
sense  of  reduction  as  opposed  to  the  more  standard  sense  of  production. 
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I  -*r,  a  production,  is  an  ordered  pair  with  both  i,  r  €  V*.  We  call  I 
the  left  of  the  production,  r  the  right  of  the  production  and  read  the 
production  as  l  produces  r. 

P  is  a  finite  set  of  productions. 

VL  =  (U  |  (3x)(3y)(3z)  with  yUz  -♦  x  in  P}  is  the  set  of  symbols  on 
the  left  in  P. 

VR  =  {U  j  (3x)(3y)(3z)  with  x  -»  yUz  in  P)  is  the  set  of  symbols  on 
the  right  in  P. 

VT  =  is  the  set  of  terminal  symbols . 

VN  =  V  -  VT  ,  the  complement  of  ,  is  the  set  of  nonterminal  symbols . 

Vq  =  is  the  set  of  symbols  appearing  only  on  the  left  in  productions. 

We  call  the  set  of  goal  symbols .  If  l->  r  is  in  P,  then  for  any 
x  and  y  we  may  write  x/y-*  xry  and  read  x*y  directly  produces  xry, 
or  xry  directly  reduces  to  xiy.  We  immediately  note  that  for  every 
production,  the  left  of  the  production  directly  produces  the  right  of  the 
production.  We  regard  each  production  as  a  rewriting  rule  allowing  the 
substitution  of  the  right  of  the  production  for  any  occurrence  of  the  left 
of  the  production  in  any  string.  If  a  string  is  in  v£  then  there  can 
be  no  applicable  production  and  the  process  of  production  must  halt, 
hence  the  name  terminal  symbols  is  applied  to  V^. 

One  may  also  regard  a  production  as  a  rewriting  rule  in  the  direction 
opposite  to  the  arrow.  In  that  case  the  rule  would  be  called  a  reduction. 

In  simplest  terms,  we  would  think  of  speaking  as  involving  actions  of 
production  and  listening  as  involving  actions  of  reduction.  It  will  be 
convenient  to  phrase  our  theorems  in  terms  of  productions  while  our 
programs  are  capable  only  of  reductions. 
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If  y  =  -*  -»  =  z  for  n  >  1,  then  we  write  y  =>  z 

and  read  y  produces  z  or  z  reduces  to  y. 

If  we  write  y  ->  ->  z  we  mean  y  ->  z  with  n  >  1. 

The  set  DS(p)  =  (x  |  (3G),  G  €  Vq  ,  G  ->  =>  x)  is  the  set  of  strings 
derivable  in  P. 

L(P)  =  DS(P)  H  V*  ^s  ca^2.ed  the  language  defined  by  P.  The  members 
of  L(P)  are  called  the  sentences  of  the  language.  Note  that  it  is  the 
sentences  that  can  be  written  as  text  and  we  need  be  concerned  only  with 
the  analysis  of  sentences. 

Since  a  language  is  fully  determined  by  the  set  of  its  productions  P, 
we  will  refer  to  the  set  of  productions  as  the  grammar  P.  We  lose  the 
generality  of  being  able  to  select  a  single  member  of  Vq  as  the  distin¬ 
guished  symbol,  but  the  loss  does  not  affect  our  considerations  since  we 
have  other  reasons  to  restrict  to  a  single  unique  member. 

For  example :  Let 
P  =  (G  -*  X,  X  -*  XX,  X  -♦  Y)  . 

Then: 

V  =  (G,  X,  Y}  , 

^  -  CY)  , 

V,j  =  (G,  X}  , 

Vq  a  (g)  > 

v*  =  {A,  G,  x,  Y,  GG,  GX,  GY,  XG,  • . .  etc.)  , 

V$  =  (A,  Y,  YY,  YYY,  •  •  •  etc • }  j 
L(P)  =  (Y,  YY,  YYY,  ...  etc.  )  . 

G-*X-*XX-*XXX-*XXY-»XYY-*YYY 
is  an  explicit  demonstration  of  the  fact  that  G  produces  the  string 
YYY  (G  =>  YYY). 


We  now  direct  our  attention  to  a  subset  of  production  grammars, 
called  phrase  structure  grammars,  in  which  the  form  of  the  productions 
is  restricted  to  L -*  r.^ 

We  will  also  assume  two  additional  restrictions: 

(1)  Vq  has  a  single  element;  we  will  designate  it  by  G  . 

(2)  (Vx)(3t)  such  that  X  €  VN  ,  t  €  V*  and  X=$>  t  . 

The  alternative  to  restriction  (l)  is  to  distinguish  one  member 
of  Vq  explicitly  in  the  description  of  the  language.  We  reject  this 
for  two  reasons:  First,  the  productions  describing  the  other  members  of 
V-,  can  be  discarded  since  they  can  never  be  used  in  an  analysis;  Second, 
we  like  to  be  able  to  test  the  productions  for  the  existence  of  a  unique 

N 

goal  as  a  check  against  programmer  errors. 

Restriction  (2)  excludes  grammars  that  give  rise  to  derivations 
that  can  never  terminate  in  a  sentence.  It  happens  that  condition  (2) 
is  also  required  to  prove  the  equivalence  of  simple  precedence  grammars 
and  symbol  pair  grammars  (see  page  27). 

The  Canonical  Parse 

If  xYz  -»  xyz  and  z  €  V*  ,  then  we  call  the  ordered  pair 
(xYz,  xyz)  a  canonical  parsing  step  (abbreviated  CPS).  Note  that  it 
is  the  rightmost  nonterminal  symbol  (RNS)  that  is  replaced  in  a  CPS. 

If  every  step  in  s  =£>  t  is  a  CPS,  we  call  the  sequence  of  steps  a 
canonical  parse.  A  CPS  induces  a  partition  (xyz)  on  the  unreduced  text. 

Note  that  we  imply  L  €  V  and  r  €  V*  by  our  conventions  on  upper 
and  lower  case. 
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Knuth  calls  the  segment  y  the  handle  [l6]  which  unfortunately  conflicts 
with  Greibach's  term  handle  [  12 ] .  Wirth  and  Weber  [25]  call  y  the 
leftmost  reducible  substring  which  implies  a  relation  that  we  do  not  wish 
to  pursue.  We  will  give  y  the  name  canonically  reducible  string  and 
abbreviate  it  CRS.  For  a  particular  CPS,  the  CRS  is  well  defined. 

If  we  view  the  CPS  in  the  sense  of  production,  we  see  that  zero 
or  more  symbols  are  added  to  the  terminal  string  to  the  right  of  the 
rightmost  nonterminal  symbol.  Therefore  the  length  of  the  string  z  of 
terminal  symbols  is  a  monotonic  function  of  the  number  of  canonical 
parsing  steps.  Now  viewed  as  a  reduction,  we  see  that  the  canonical 
parse  inforces  exactly  the  same  order  of  productions  as  required  by  a 
left-to-right  scan  of  the  sentence. 

Because  of  its  relation  to  left-to-right  parsing  algorithms,  the 
concept  of  a  canonical  parse  has  appeared  in  many  forms.  It  was  first 
explicitly  named  in  [5]  and  [25]  independently. 

A  sentence  which  has  two  essentially  different  structures  is 
called  ambiguous.  Formally,  a  sentence  is  unambiguous  if  and  only  if  it 
has  a  unique  canonical  parse.  Furthermore;  a  language  containing  an 
ambiguous  sentence  is  ambiguous;  a  grammar  defining  an  ambiguous  language 
is  ambiguous. 

The  reader  should  verify  that  the  grammar,  language  and  sentence 
in  the  preceding  example  are  formally  ambiguous  according  to  our  definition. 
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The  Parsing  Function 


The  problem  of  parsing  a  text  t  reduces  to  finding,  at  each  stage, 
the  string  t^  so  that  -*t^  is  a  CPS.  If  a  sentence  t  is 
unambiguous  then  we  see  immediately  that  each  intermediate  stage  of  its 
derivation  is  unambiguous.  In  particular,  we  note  that  for  all  i,  t^ 
is  uniquely  determined  by  t^  ^  alone We  can  therefore  infer  the 
existence  of  a  uniquely  valued  parsing  function  P  such  that  P(t^  =  t^. 
The  following  algorithm  is  the  complete  solution  to  the  problem  of  parsing 
an  unambiguous  sentence. 


START 


STOP 

i 


The  assured  existence  of  the  function  P  is,  however,  of  little 
use  in  constructing  a  translator.  The  only  way  to  compute  its  values  in 
general  is  to  parse  the  sentence  t  and  record  the  results  in  a  table 
(which  rather  begs  the  question)  . 

^  For  otherwise  we  would  have  two  canonical  parses  of  t  . 
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It  is  surprising  to  find  that  for  a  restricted  set  of  phrase 
structure  grammars,  we  can  find  economical  ways  of  computing  the  parsing 
function.  Two  [7,25]  have  been  previously  published.  A  third  way,  and 
some  steps  toward  a  fourth  are  presented  below.  Except  that  Floyd's  algorith] 
skips  some  CPS,  all  are  special  cases  of  the  following  detailed  breakdown 
of  an  algorithm  to  compute  the  function  P  . 

PI,  P2,  and  P3  are  functions  of  three  string- valued  variables 
x,  ^  and  z.  For  the  moment  we  will  underline  program  variables  to  distin¬ 
guish  them  from  values  with  the  same  name  but  derived  from  the  canonical 
parse.  If  the  catenation  xyz  is  in  DS(p)  and  L(p)  is  unambiguous 
then  there  is  a  unique  partition  xyz  =  xyz  of  the  catenation  of  strings 
in  the  program  variables  x,  £  and  z  and  a  unique  production  Y  -*  y  in  P 
such  that  G  =S>  xYz  xyz  is  canonical.  We  give  an  Algol-like  definition 
of  the  functions  in  terms  of  the  partition  and  production  as  follows: 


pi(x,£,z): 


P2(x,£,z): 


P3(x,£,z): 


If  G  -*=f>  xyz  then 

(x  =-•  xy  and  £  =  A  and  x  =  z)  else  undefined; 

If  G  -+=t>  xyz  then 

(x  =  x  and  ^  =  y  and  z  =  z)  else  undefined; 

If  P2(x,^,_z)  then  Y  else  undefined. 


l 
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The  general  parsing  algorithm. 


START 


STOP 


In  terras  of  a  syntactic  analysis  algorithm,  we  would  assign  the 
following  individual  responsibilities  to  the  functions: 

PI:  read  the  input  tape.. 

P2:  locate  the  CRS  y  to  be  replaced. 

P3:  perform  the  reduction. 

Due  to  the  monotonicity  of  the  length  of  z,  we  must  decide  before 
each  CPS  whether  to  shorten  z.  At  the  termination  of  the  loop  on  PI, 
we  have  assured  ourselves  that  all  of  the  CRS  is  on  the  tail  of  x.  We 
have  located  one  boundary  of  y.  The  left  boundary  is  found  in  the 
loop  on  P2.  At  the  termination  of  the  larger  loop,  we  substitute  Y 
for  y,  leaving  the  nonterminal  symbol  Y  on  the  tail  of  x.  If  we  have 
reduced  the  entire  string  to  the  goal  we  are  through.  Otherwise,  we  return 
to  the  loop  on  PI. 

A  cycle  through  the  functions  PI,  P2,  and  P3  is  equivalent  to 
a  single  step  on  the  function  P.  The  string  xyz  is  always  identical, 
at  the  end  of  the  main  cycle,  to  the  value  of  P(xyz) .  The  main  reason 
for  introducing  the  function  PI,  F2,  and  P3  is  that  their  values  can 
be  handled  as  reasonable  computational  entities.  The  parameters  of  the 
functions  are  still  unwieldy  which  reflects  the  fact  that  the  function 
values  may  depend  upon  an  examination  of  the  entire  text. 

Theorem.  If  the  input  text  is  a  sentence  and  the  grammar  is  unambiguous, 
the  general  parsing  algorithm  will  reduce  the  input  text  to  the  goal 
symbol  via  the  canonical  parse. 

Before  attempting  the  proof  we  must  describe  our  general  method 
for  proving  the  correctness  of  algorithms.  The  basic  mechanism  of 
inductive  closure  for  program  loops  is  described  by  Floyd  [9]  as  one 


technique  of  a  verifying  compiler.  We  state  an  initial  set  of  relations 
that  we  know  to  be  true  upon  entry  to  the  loop.  Wo  then  show  that  they 
are  invariant  with  respect  to  execution  of  the  loop,  hence  they  are 
always  true.  We  finally  deduce  some  relations  that  are  true  at  the  comple¬ 
tion  of  the  loop,  either  as  a  final  result,  or  as  a  component  in  a  proof 
on  a  larger  enclosing  loop.  In  our  deductions  we  must  insure  that  all 
actions  are  defined  and  all  loops  terminate.  Relations  true  upon  exit 
from  the  algorithm  are  then  correct  descriptions  of  the  final  state  of 
the  algorithm. 

Proof.  If  the  grammar  is  ambiguous,  the  parsing  functions  are  not 
uniquely  defined  and  it  is  meaningless  to  state  the  parsing  algorithm. 
Similiarly,  if  the  input  text  is  not  a  sentence,  all  of  PI,  P2,  and  P3 
are  immediately  undefined. 

We  need  to  show  that  we  complete  one  CPS  each  time  through  the 
outer  loop  and  that  the  process  terminates  in  a  finite  number  of  steps. 

We  cannot  analyze  the  outer  loop  until  we  understand  the  inner  loops. 

We  consider  the  loop  on  PI  first. 


17 


We  assume  the  truth  of  the  relations  listed  at  the  top  of  the  loop 
and  derive  those  at  the  bottom.  Since  G  -*s$>  x^z  ,  PI  is  defined. 

If  it  is  false,  x  ^  xy.  But  we  have  |x|  <  |xy|  hence  we  derive 
|x|  <  |  xy |  .  From  j^|  =  0  we  get  |xy|  <  |xy|  <  |xyz|  hence 
|  xyl  <  |xyz|.  But  xyz  =  xyz  thus  |a|  >  0.  There  is,  therefore,  at 
least  one  character  in  z_  and  the  action  in  the  box  is  defined.  Further¬ 
more,  all  of  the  assumptions  are  unaffected  by  the  action,  hence  are 
invariants  of  the  loop.  The  loop  must  terminate  because  z  is  of  finite 
length.  When  PI  becomes  true,  the  conditions  on  exit  from  the  loop  are 
consequences  of  the  definition  of  PI. 

Now  consider  the  loop  on  P2  with  the  resulu  of  PI  as  assumptions. 


P2  is  initially  defined  and  will  remain  so.  Since  z  is  never 
affected,  we  have  z  =  z  everywhere.  If  P2  is  false  we  have  either 
x  £  x  or  ^  ^  y.  But  either  inequality  implies  the  other,  so  we  have  both. 
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From  |^|  >  | y |  we  derive  |^|  <  | y J ,  hence  |x|  >  0.  Therefore  the 
action  in  the  box  is  defined.  All  the  assumptions  are  preserved  in  the 
loop.  The  loop  must  terminate  because  x  is  of  finite  length  yielding  the 
stated  relations  as  consequence  of  the  definition  of  P2. 

For  the  entire  algorithm  we  can  now  write 


STOP 
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By  our  assumptions,  the  input  text  is  a  sentence  and  we  have 
G  -»=>  xyz  and  its  ramifications.  Since  |x|  =  0  initially, 

|x|  <  l*y|  is  vacuously  true.  P5  is  defined  and  has  value  Y. 
xYz  -*xjrz  is  a  CPS  by  definition  hence  we  have  new  x  =  xY,  ^  =  A  f 
and  z  =  z  with  G  xyz.  If  G  =  xyz,  we  are  done.  Otherwise,  we  may 
write  again  G  -*^>  xyz  and  define  new  x,  y,  and  z.  Since  z  £  , 

xy  must  contain  all  of  the  nonterminal  symbols.  The  last  symbol  of  the 
new  x  is  nonterminal,  giving  the  required  |x|  <  |xy|.  We  find  our 
assumptions  invariant  and  also  a  consequence  of  the  initial  conditions. 
The  loop  must  terminate  since  there  are  a  finite  number  of  steps  in  a 
canonical  parse.  QED. 
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Symbol  Pair  Parsing  Functions 

If  we  wish  to  find  a  reasonably  efficient  method  for  computing  the 
parsing  functions,  we  must  renounce  the  privilege  of  examining  the  entire 
text  at  each  stage.  We  will  see  that  the  effect  of  narrowing  the  view  of 
the  parsing  functions  will  be  to  reduce  the  class  of  grammars  for  which 
we  can  build  mechanical  translators. 

We  first  postulate  that  the  parsing  functions  depend  only  upon  a 
few  symbols  in  the  region  of  the  CRS.  We  will  be  able  to  verify  our 
postulate  mechanically;1  if  it  is  false  then  the  grammar  in  question  lies 
outside  the  range  of  that  particular  analysis. 

Our  approach  will  be  to  examine  the  grammar  (mechanically,  as  it 
is  very  tedious)  to  discover  all  the  sequences  of  symbols  that  can  possibly 
occur  in  the  region  of  the  next  CRS.  For  each  possible  sequence  we  will 
record  the  required  value  of  the  parsing  functions.  When  the  resulting 
functions  are  well  defined  the  grammar  is  unambiguous  and  the  syntactic 
analysis  algorithm  in  the  compiler  always  functions  correctly.  The  func¬ 
tion  values  are  inserted  into  the  compiler  in  a  condensed  tabulated  form. 

Consider  the  three  new  functions  PI',  P2',  and  P3'  defined  in 
terms  of  PI,  P2,  and  P3. 

If  Pl(x,£,£)  is  defined,  X  is  the  last  symbol  of  x  and  Z  is 
the  first  symbol  of  z,  then  we  define  P1'(X,Z)  to  be  identical  to 
Pl(x,^,z,).  Similiarly,  P2'(X,Z)  must  be  identical  to  P2(x,^,z) 
when  P2  is  defined#  X  is  the  last  symbol  of  x  and  Z  is  the  first 
symbol  of  the  catenation  £Z.  P5'(^;)  must  be  identical  to 

when  P3  is  defined.  We  will  call  a  grammar  for  which  the  functions 
PI',  P2 ' ,  and  P3'  are  well  defined  a  symbol  pair  grammar  (or  more 
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generally,  as  we  will  see,  a  (l,lXl*l)  canonical  parse  grammar).  We 
will  be  able  to  show  that  under  restrictions  (l)  and  (2)  (page  11 ), 
symbol  pair  grammars  are  equivalent  to  simple  precedence  grammars  [25] . 

The  number  of  arguments  for  which  PI,  P2,  and  P3  are  defined  is 
in  general,  infinite.  On  the  other  hand,  if  NSY  is  the  number  of  symbols 
in  V,  PI'  °nd  P2'  need  be  defined  for  at  most  NSY  squared  possible 
arguments.  P3'  is  defined  only  when  ^  is  the  right  part  of  a  production 
and  thus  also  has  a  finite  number  of  possible  arguments.  It  is  immediately 
clear  that  we  must  apply  a  new  restriction  in  order  to  make  P3'  well 
defined: 

Restriction  (3):  No  two  productions  may  have  equal  right  parts. 

We  ma^,  as  has  been  pointed  out  to  the  author  by  N.  Wirth,  lift  restriction 
(3)  if  we  have  any  way  of  distinguishing  equal  right  parts.  A  particular 
case  in  point  is  the  Algol  60  <identifier>  which  we  might  wish  to  reduce 
to  <array  identified,  or  to  <variable>,  etc.,  where  the  decision  can  be 
made  due  to  other  non-grammatical  information.  We  will  call  the  number  of 
productions,  (and,  under  restriction  (3^  the  number  of  CRS)  NPR. 

We  see  that  the  boundaries  between  x  and  ^  and  between  and  z 

in  the  general  parsing  algorithm  always  lie  immediately  to  the  left  of, 
within,  or  immediately  to  the  right  of  the  next  CRS.  The  parameters  X 
and  Z  of  PI'  and  P2'  always  lie  on  opposite  sides  of  one  of  the 
boundaries;  the  values  of  PI'  and  P2'  depend  upon  where  the  boundaries 
lie  with  respect  to  the  CRS.  We  will  be  able  to  compute  the  position  of 
the  boundaries  with  the  help  of  the  three  following  set  definitions: 


TS(X),  the  set  of  tail  symbols  of  X,  is  given  by 

(Y|(3y),  X-.tt.yY)  . 

HS(X),  the  set  of  head  symbols  of  X  is  given  by 

(Y|(3y),  X->^>  Yy)  . 

HS,j(X),  the  set  of  terminal  head  symbols  is  given  by 

(HS(X)  U  (X))  n  Vj  . 

Note  that  if  X  is  terminal,  the  first  two  sets  are  null  but  the  third 
is  not. 

When  X  is  a  tail  symbol  of  the  rightmost  symbol  in  a  CRS  and  Z 
is  a  head  symbol  of  anything  that  might  follow  that  CRS  in  a  sentence, 
P1*(X,Z)  must  be  true  and  never  otherwise.  Similiarly,  whenever  X  lies 
within  a  CRS  and  Z  is  a  head  symbol  of  the  next  symbol  needed  toward 
the  completion  of  that  CRS,  P1'(X,Z)  must  be  false  so  that  the  needed 
symbol  is  moved  onto  x.  In  terms  of  a  production: 

W  ->  ulIVv  , 

we  cannot  start  building  V  if  U  has  not  yet  been  fully  formed.  Since 
we  have  narrowed  our  view  to  one  symbol  on  either  side  of  the  boundary, 
we  must  never  move  any  symbol  in  the  head  of  V  from  z  to  x  if  the 
last  symbol  of  x  is  a  tail  symbol  of  U.  If  U  has  been  formed  and 
is  the  last  symbol  of  x,  we  must  move  any  head  symbol  of  V  onto  x 
to  start  building  toward  V  and  finally  uUVv.  We  may  very  well  find 
conflicting  demands,  a  symbol  that  must  be  moved  on  account  of  one  pro¬ 
duction  and  must  not  be  moved  on  account  of  another  production.  Conflicts 
are  common  in  practice  and  constitute  a  serious  nuisance.  The  compiler 
writer  can  usually  modify  his  grammar  in  a  trivial  way  to  remove  the 
conflict.  A  more  general  solution  would  be  to  extend  the  view  of  the 
parsing  functions,  an  approach  which  is  discussed  later  in  this  section. 
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The  function  of  the  loop  on  P2'  is  to  march  down  across  a  given 
CRS  and  locate  its  left  boundary.  In  terms  of  the  sample  production,  it 
is  clear  that  P2'(U,V)  must  be  false  for  every  pair  U,V  that  are 
contiguous  within  a  CRS.  P2'(X,Z)  must  be  true  whenever  we  cross  the 
left  boundary  of  a  CRS — a  condition  that  is  true  when  X  lies  within  a 
CRS  and  Z  a  head  symbol  of  the  next  item  to  be  formed  within  that 
CRS.  We  can  summarize  these  relations  with  a  mnemonic  table: 


w  -> 

uUVv 

P1'(U,HST(V))  =  false 

P1'(TS(u),HSt(V))  =  true 

P2'(U,V)  =  false 

P2  * (U,HS(v) )  =  true 

P3’(uUVv)  =  W 

We  need  only  consider  terminal  symbols  for  PI'  since  we  know  that 
z  contains  only  terminal  symbols.  We  are  also  implicitly  assuming  some 
strings  to  be  non-empty.  We  avoid  this  last  problem  by  adding  a  production 
leading  to  the  goal  symbol, 

G'  -*  h  G  H  ,  where  4-  and  H  are  end-of-file  symbols 
that  we  may  use  to  initialize  x  and  append  to  z.  As  modified  the 
parsing  algorithm  becomes: 


The  Symbol  Pair  Parsing  Algorithm 


START 


G  :=  goal  symbol 


X  :=  last  symbol  of  x  ; 

Z  :=  first  symbol  of  z  ; 


P1'(X,Z) 


false 


move  first  symbol 
of  z  to  tail  of  x 


true 


X  :=  last  symbol  of  x  ; 

Z  :■  first  symbol  of  ^rz  ; 


P2'(X,Z) 


false 


move  last  symbol 
of  x  to  head  of  ^ 


true 


1  :«*  P3'(£)  ; 
x  :=  £L>  Z  :=  A  5 


false 


xyz  =  K  H  ? 


true 


STOP 
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We  state  some  consequences  of  the  definition  of  symbol  pair  grammars. 


Theorem.  If  the  symbol  pair  parsing  algorithm  terminates  normally,  it 
has  produced  the  canonical  parse  for  the  input  text. 

Proof.  The  only  transformation  allowed  on  the  text  is  the  substitution 
of  the  leftpart  of  a  production  for  the  rightpart,  thus  it  is  immediately 
obvious  that  if  the  algorithm  functions  at  all,  it  produces  a  parse. 

After  each  substitution  we  see  that  the  newly  formed  reduced  symbol  is  the 
rightmost  nonterminal  symbol  in  the  text,  hence  that  step  was  a  CPS.  QED. 

Theorem.  A  symbol  pair  grammar  is  unambiguous.  ([23]  p.  26). 

Proof.  Assume  the  contrary.  Then  there  is  a  sentence  for  which  there 
exist  two  canonical  parses.  We  first  show  that  the  existence  of  two 
different  overlapping  CRS  implies  a  conflict  in  the  parsing  functions. 

Assume  that  our  text  is 

xL1L2  *  *  ,LkMlM2  ’  *  ,MmRlR2  *  *  ,RnZ 

and  both  of  L^...^  and  M^...R^  are  CRS  with  m  >  0,  and  one 
of  k  or  n  >  0. 

We  treat  the  case  k  >  0  in  detail.  From  the  fact  that  Ln . .  .M 

1  m 

is  a  CRS  we  immediately  derive  P£(Lk,  HS^,^))  =  false  and 
P£(Lk,  Uj)  =  false . 

(We  substitute  the  set  as  an  argument  of  P^  meaning  the  relation  is 
true  for  all  members  of  that  set).  Now  perform  the  rightmost  reduction 
and  our  text  becomes 

xL^Lg . • • L^Mz 

where  M  was  the  leftpart  of  the  production.  Either  and  M  are 
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next  to  each  other  in  a  production  or  further  reduction  brings  us  to  the 
text 

x'L'M'?' 

where  L'  and  M*  are  next  to  each  other  in  a  production, 

L'  =>  uL  ,  M'  =5>  M.v. 
k  1 

In  the  first  case  we  have  P2'(l,  ,M,  )  =  true  and  in  the  second, 

k  1'  - 

P|(L^,  HSt(M'))  =  true.  Either  implies  a  conflict  since  €  HS(M' ) 
and  HS^  is  never  empty. 

The  situation  is  entirely  similiar  for  n  >  0.  Thus  we  find  our 
only  choice  during  reduction  is  which  of  several  disjoint  CRS  to  pick. 

Let  us  assume  that  we  pick  other  than  the  leftmost,  substituting  for  it 
the  nonterminal  symbol  in  the  leftpart  of  its  production.  There  is  a 
CRS  to  the  left  which  must  always  be  disjoint  from  all  other  CRS,  hence 
will  eventually  be  reduced  to  its  leftpart.  But  such  a  step  is  not  a 
CPS  because  we  have  already  formed  a  nonterminal  symbol  to  its  right. 

In  order  to  form  the  canonical  parse,  we  must  always  pick  the  leftmost 
CRS  and  it  is  unique,  thus  the  canonical  parse  is  unique  and  the  grammar 
is  unambiguous. 

We  will  now  define  the  simple  precedence  grammars  of  Wirth  and  Weber 
and  show  their  equivalence,  under  restriction  (2),  to  symbol  pair  grammars. 
We  define  three  relations,  <,=,>,  between  symbol  pairs  as  follows: 
For  every  production  of  the  form  W  -»  uUVv 
U  =  V, 

Z  €  HS(V)  implies  U  <•  Z 
X  6.  TS(u)  implies  X  >  V 
X  €  TS(u)  and  Z  €  HS(v)  imply  X  >  Z. 


If  for  each  pair  of  symbols  in  V  at  most  one  of  the  above  relations 
holds i  the  grammar  is  a  simple  precedence  grammar. 

Theorem.  If  P  is  a  simple  precedence  grammar,  then  P  is  a  symbol 
pair  grammar.  If  P  is  a  symbol  pair  grammar  and  restriction  (2)  holds, 
then  P  is  a  precedence  grammar.  We  immediately  exhibit  a  symbol  pair 
grammar  that  violates  restriction  (2)  and  thus  fails  to  be  a  simple 
precedence  grammar. 

P  =  (G  -*  AB,  A  -»  X,  A  ->  XB,  B  -»  C,  C  -»  CY)  . 

The  reader  may  find  it  instructive  to  build  the  six  by  six  matrix  of 
precedence  relations  implied  by  the  definition  and  find  the  two  conflicts, 
one  of  which  is  X  <•  C  and  X  •>  C. 

Proof.  We  show  that  if  P  is  not  a  simple  precedence  grammar  then  it 
is  not  a  symbol  pair  grammar  and  the  converse. 

Assume  that  P  is  not  a  simple  precedence  grammar.  Then  there 
exist  at  least  two  symbols  related  by  at  least  two  of  the  three  relations 
<•  ,  =  ,*>.  We  treat  each  case  separately. 

(a) .  U  *  V  implies  (3W)(3*i)(3v)  such  that  W  -» uUVv. 

(b) .  U  <•  V  implies  (3W)(3u)(3S)(3v)  such  that  W  ->  uUSv 

and  V  €  HS(S). 

(c) .  U>  V  implies  (3W)(3u)(3R)(3v)  such  that  W -♦  uRVv 

with  U  €  TS(R),  or 

(3W)(3u)(3R)(3S)(3v)  such  that 
W  -*  uRSv  with  U  €  TS(R)  and  V  €  HS(S)  . 
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From  the  existence  of  a  relation  between  two  symbols  we  have  been 
able  to  infer  the  existence  of  the  production  from  which  the  relation 
was  derived.  Now  from  the  productions  we  can  derive  some  values  for 
the  functions  PI '  and  P2 ' . 

(a)  implies  P2'(U,V)  is  false  and  (Vx),  X  €  HST(V) 
gives  P1'(U,X)  is  false. 

(b)  implies  P2'(U,V)  is  true  and  (Vx),  X  €  HST(S) 
gives  P1'(U,X)  is  false,  Also  HST(V)  C  HST(S). 

(c)  implies  (Vx)  X  €  HST(v)  gives  P1'(U,X)  =  true  since 
U  €  TS(R)  and  HST(v)  C  HST(S). 

On  account  of  restriction  (2),  we  see  that  HS^,  is  always  nonempty. 
Therefore  if  any  two  of  (a),  (b)  or  (c)  hold  simultaneously,  we  have  a 
conflict  in  PI'  or  P2'j  hence  P  is  not  a  symbol  pair  grammar. 

Converse .  Assume  that  P  is  not  a  symbol  pair  grammar.  Then  there  exist 
symbols  U  and  V  for  which  either  PI'  or  P2‘  is  double  valued. 

(d) .  P1'(U,V)  is  true  implies  (3W)(3u)(3R)(3S)(3v) 

such  that  W  -» uRSv  with  U  €  TS(R)  and  V  €  HST(S). 

(e) .  P1'(U,V)  is  false  implies  (3W)(Bu)(BS)(3v) 

such  that  W  -*  uUSv  with  V  €  HST(S). 

Now  V  €  HSt(S)  implies  V  €  HS(S)  or  V  -  S,  thus  (e)  implies 
U  ■  V  or  U  <*  V  and  (g)  implies  U  •>  V,  conflict. 

(f) .  P2'(U,V)  is  true  implies  (3W)(3u)(3S)(3v) 

such  that  W  -»  uUSv  with  V  €  HS(S). 

(g) .  P2'(U,V)  is  false  implies  (3W)(3u)(3v) 

such  that  W  -»  uUVv. 

But  (g)  implies  U  =  V  and  (f)  implies  U<*V.  Conflict,  QED. 
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In  terms  of  the  general  parsing  algorithm,  the  precedence  relations 
can  be  thought  of  as  a  thre-'  valued  function  P12'(X,z)  which  is  used 
for  "both  analysis  loops.  Replacing  PI',  it  is  false  if  it  has  value 
<•  or  =  and  true  if  •>  .  Replacing  P2',  it  is  false  if  =  and 
true  otherwise.  ([25]  p.  20 ).  It  is  surprising  to  find  that  even  though 
the  defining  matrix  for  P12'  is  twice  as  dense  as  corresponding  matrices 
for  PI'  and  P2*  and  also  contains  spurious  relations  due  to  the  over- 
restrictive  fourth  defining  rule  for  simple  precedence  grammars,  that 
no  extra  conflicts  are  introduced. 

In  either  case,  the  matrices  defining  the  parsing  functions  turn 
out  to  be  rather  sparse,  and  rather  large.  In  the  process  of  building 
the  parsing  functions,  we  tabulate  the  symbols  of  V,  and  manipulate 
instead  the  integer  corresponding  to  their  symbol  table  location.  As 
suggested  by  Floyd  ([7]  p.  325)  we  can  frequently  find  functions  fl  and 
gl  such  that  if  P1’(U,V)  is  true,  fl(U)  >  gl(V)  and  if  F1'(U,V)  is 
false,  fl(Ux  "  gl(V). 

We  can,  of  course,  do  the  same  for  P2'.  The  advantage  accrues  in 

requiring  only  4  NSY  memory  locations  for  the  tables  defining  the  func- 

2 

tions  fl,  gl,  f2,  and  g2  instead  of  2  NSY  locations  required  for 
the  matrices  explicitly  defining  PI'  and  P2  .  This  is  somewhat  offset 
by  the  fact  that  the  Boolean  matrices  defining  PI'  and  P2'  could  be 
packed  in  digital  memory.  At  present,  all  syntax  checking  is  done  by  the 
function  F3'  and  the  only  error  indication  is  that  the  CRS  found  is 
not  in  the  production  table.  If  we  retained  the  functions  PI'  and  P2' 
including  the  undefined  values,  we  would  have  an  additional  (redundant) 
method  of  error  checking. 
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Let  P  be  an  arbitrary  Boolean  matrix  (values  0  and  l).  For 
all  X  and  Y,  define 

NSY  fvil  v 

f(X)  =  £  2U‘1)P(X,Y),  g(Y)  =  2Y  . 

Y=1 

Then  P(X,Y)  =  1  if  and  only  if  (f(X)  mod  g(Y))  >  g(Y)/2  .  Thus  we 
can  state  that  a  relation  always  exists  with  which  we  can  record  the 
content  of  a  Boolean  matrix  P  in  two  linear  arrays.  The  relations  "<" 
and  ">"  are  adequate  in  practice. 

We  present  the  symbol  pair  syntax  preprocessor  in  two  forms.  The 
first  is  written  in  the  kernel  language  presented  in  Section  3,  the 
second  is  the  listing  of  the  Burroughs  B5500  Algol  program  actually 
used  to  generate  tables  for  the  extendable  compiler  of  Section  U.  We 
find  it  informative  to  compare  the  programs  for  conciseness  and  readability. 
While  the  two  programs  accomplish  essentially  the  same  actions,  the  kernel 
language  version  is  approximately  one  half  as  long  as  the  Algol  version. 

A  detailed  inspection  of  the  program  text  reveals  that  the  major  savings 
are  in  implicit  table  lookups  (€,  $£,  index)  and  the  generalized  for 
loop.  7n  particular,  there  are  26  occurences  of  the  symbol  for  in  the 
kernel  language  version  while  the  Algol  version  contains  35.  Further¬ 
more,  we  find  ten  labels  in  the  Algol  version  of  which  perhaps  one  half 
are  essential  and  none  of  which  contribute  to  the  reader's  ability  to 
understand  the  program. 

Since  the  kernel  language  is  discussed  in  detail  in  Section  3,  we 
will  say  nothing  further  about  it  here.  Burroughs  B5500  Algol  is  in  most 
respects  exactly  Algol  60.  The  input  and  output  conventions  are  relatively 
standard  except  for  the  following  features: 
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(1)  On  line  7  of  the  program  we  see  a  file  declaration  for  the 
card  punch.  Its  function,  setting  aside  buffer  areas  for  the  card 
punch,  is  not  important  to  an  understanding  of  the  program. 

(2)  Three  lines  below  we  find  a  WRITE  statement  in  the  form  of 
a  procedure  call.  The  first  parameter  to  WRITE  is  a  format  which  is 
indicated  to  the  Algol  compiler  by  enclosing  the  format  in  the  brackets 
<  and  >.  All  the  remaining  parameters  are  values  to  be  written. 

In  the  middle  of  the  third  page  we  see  two  STREAM  procedures.  They 
are  an  interface  with  the  character  mode  machine  instructions  of  the 
B5500  used  to  set  and  interrogate  two-bit  fields  within  the  48  bit  B5500 
word.  Since  we  may  have  upwards  of  100  symbols  and  have  two  matrices 
with  that  number  squared  of  elements,  packing  the  values  is  unavoidable 
in  Stanford's  16  thousand  word  B5500  memory.  Packing  would  be  somewhat 
more  convenient  in  the  kernel  language  since  we  can  use  subscripts  to 
access  bit  strings  directly. 

Finally,  we  use  the  machine  clock  to  obtain  execution  time  infor¬ 
mation  for  the  user.  One  of  our  objectives  is  the  accumulation  of  precise 
timing  information  for  the  behavior  of  the  preprocessor  as  a  function  of 
the  number  of  productions  and  number  of  symbols.  Preliminary  data  gives 
the  surprising  conclusion  that  execution  time  is  a  linear  function  of 
the  number  of  productions  (about  2  seconds  per  production). 

We  now  give  a  narrative  of  the  kernel  language  version  of  the 
program.  Our  first  action  is  to  name  all  the  identifiers  local  to  the 
main  block  and  initialize  P  to  the  null  set.  We  examine  the  first 
character  from  the  input  medium  and  continue  to  read  productions  until 
an  end-of-file  symbol  is  encountered.  Our  productions  are  character 


strings  whose  length  is  a  multiple  of  12.  The  first  12  characters  are 

© 

the  leftpart  of  the  production  and  the  remaining  fields  are  the  symbols 
of  the  rightpart.  A  carriage  return  delimits  the  production.  Internally, 
a  production  is  an  ordered  set  of  strings,  each  element  representing  one 
symbol  in  the  production.  We  make  special  provision  (if  (length  t)^  0 
then  . . . )  for  blank  lines  which  can  be  used  to  increase  the  readability  of 
the  production  tables.  We  also  print  the  productions  to  supply  the  user 
with  a  record  of  his  input. 

If  the  leftpart  of  two  successive  productions  is  the  same,  we 
allow  the  user  to  substitute  a  field  of  twelve  blanks  for  the  second 
leftpart,  again  to  increase  readability.  At  the  completion  of  input  we 
immediately  repair  the  amission. 

Then,  in  three  lines,  we  use  the  generalized  for  loop,  set  union 
and  set  difference  to  build  all  the  symbol  tables  that  we  will  need. 

Four  more  lines  of  program  records  them  on  the  output  medium. 

After  excluding  the  possibilities  of  empty  and  repeated  rightpart s, 
it  becomes  advantageous  to  replace  the  production  table  with  a  new  table 
"PR"  of  identical  format  except  that  its  elements  are  the  indices  of  the 
production  symbols  in  the  vocabulary  V.  We  then  complete  our  grammar 
checks  by  excluding  the  possibility  of  a  grammar  with  nonterminating 
phrases  (restriction  2). 

We  define  procedures  to  compute  head  and  tail  symbols.  Note  that 
we  recompute  the  head  and  tail  symbols  repeatedly  within  the  analysis 
loop.  In  the  processor  for  the  (2,l)(l,2)  grammars  we  adopt  a  suggestion 
of  N.  Wirth  to  compute  an  "occurence"  matrix  which  need  not  be  re-evaluated. 
The  latter  is  probably  a  superior  approach. 
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We  then  initialize  the  matrices  PI  and  P2  to  NSY  squared 
values  undefined,  and  proceed  to  evaluate  the  functions  PI'  and  P2' 
according  to  the  directions  of  the  theory.  For  every  pair  of  adjacent 
symbols  in  the  grammar  (j  and  k  in  the  program)  we  evaluate  the  tail 
and  head  symbols.  We  record  P2'(j,k)  true  and  all  of  P2'(j,HS(k))  false . 
Then  we  modify  heads  to  become  the  set  HS,p  and  evaluate  PI'  in  the 
same  manner. 

Our  final  task  is  the  computation  of  Floyd's  linearization  functions 
f  and  g.  Our  algorithm  is  modeled  on  that  of  N.  Wirth  [26]  but  is 
simpler  since  our  matrices  are  two  valued  instead  of  three  valued. 

Our  algorithm  proceeds  to  satisfy  the  requirements  of  the  decision  function 
starting  in  the  upper  left  corner  of  its  defining  matrix.  We  add  a  row 
to  the  satisfied  area  (null  to  begin  with)  and  call  uprow  to  assure  that 
(1)  f  is  large  enough  to  satisfy  all  the  requirements  given  by  the  value 
false  and  (2)  g  is  large  enough  to  satisfy  all  the  requirements  given 
oy  the  value  true.  If  we  must  change  g  we  call  upcol  to  readjust  that 
entire  column. 

It  is  possible  to  have  functions  PI'  and  P2’  but  still  not  have 
a  linearization  for  the  relation  pair  <  and  >.  At  any  given  stage  of  the 
operation  of  the  algorithm  above,  we  know  that  the  submatrix  in  the  upper 
left  corner  has  been  correctly  linearized.  Thus  if  we  are  going  to  fail, 
the  failure  must  involve  one  of  the  last  relations  added  to  consideration. 

We  can  check  within  the  adjusting  procedures  to  see  that  we  never  return 
to  adjust  one  of  the  last  relations  added.  If  we  do,  we  have  failed  and 
print  a  diagnostic  error  trace  indicating  the  exact  reason  for  that 
particular  failure. 
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'  Kernel  language  version  of  (l,l)(l,l)  syntax  preprocessor  ' 

{  new  j  k  kl  P  PR  NPR  V  NSY  VL  VR  VT  VN  VG  PI  P2  f  g  t  heads  tails 
fail  beenatrowk  beenatcolk  HS  TS  upcol  uprow  change, 

P  «-  { } ,  '  the  null  set  of  productions  ' 
while  in[l]  ^  eof  do 

(  t  <-  {),  '  the  input  loop,  build  a  production  ' 
while  in[l]  £  cr  do 

{  t  <-  t  ©  {  in[l  to  12]  } ,  '  fixed  field,  12  characters  ' 
in  «-  in[l3  to  length  in] 

), 

in  «-  in[2  to  length  in], 

if  (length  t)  5/  0  then  P  «-  P  ©  (t),  '  add  another  production 
out  out  ©  (©/ 1 )  ©  cr  '  print  the  production  ' 

), 

NPR  <-  length  P,  VL  *-  VR  set  { }, 

for  all  i  from  1  to  NPR  do  'replace  omitted  left  parts' 

(if  P[i][l]  =  "  "  then  P[i][l]  «- P[i-l][l] ), 

for  all  t  from  P  do 

(  VL  4-  VL  U  { t ( 1  ] } ,  VR  «- VR  U  t[2  to  length  t]}, 

V  «-  VL  U  VR,  VT  <-  VR  e  VL,  VN  «-  V  e  VT,  VG  e  VL  e  VR, 

NSY  <-  length  V, 

out  *-  out  ©  (if  (length  VG)  £  1  then  "no  "  else  "")  © 

"Unique  leftmost  symbol:  "  ©  (©/VG)  ©  cr  © 

"Terminal  symbols:  "  ©  (®/VT)  ©  cr  © 

"Non  terminal  symbols:  "  ©  (©/VN)  ©  cr, 
for  all  t  from  P  do  (if  (length  t)  =  1  then 

out  «-  out  ©  t [ l]  ©  "  has  an  empty  right  part"  ©  cr), 

for  all  i  from  1  to  NPR  do  for  all  j  from  i+1  to  NPR  do 

(if  P[ i] [2  to  00]  =  P[ J ] (2  to  00]  then 
out  <-  out  ©  "Productions  "  ©  cr  © 

(®/P[i])  ©  "  and"  ©  cr  ©  (®/P[  J 3 )  ©  cr  © 

"have  equal  right  parts"  ©  cr), 

PR  «-  P,  'convert  productions  strings  to  symbol  table  location' 
for  all  i  from  1  to  NPR  do  for  all  j  from  1  to  length  P[i]  do 
PR[i][ j]  «-P[i][j]  index  V, 
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'  The  final  grammar  check — for  nonterminating  phrases' 
for  all  i  from  1  to  NPR  do  P[i]  4-  P(i]  e  VT, 
change  *-  true 

while  change  do  'now  try  to  collapse  grammar' 

(  change  «-  false, 

for  all  t  from  P  do  (if  (length  t)  =  1  then 
(for  all  i  from  1  to  NPR  do 

for  all  j  from  2  to  length  P[i]  do 
(if  PliHj]  =  t[l]  then 
Pli]  <-  P(i]  e  (t(l])), 

P  «-  P  e‘  (t) ,  change  <-  true 

)) 

), 

if  P  £  ( }  then  out  «-  out  ©  "grammar  includes  a 
non  terminating  phrase"  •  (©/©/P)©cr, 

HS  4-  (p) 

{  new  s, 

for  all  t  from  PR  do 

(if  t[l]  »  s  then  if  t[2]  f-  heads  then 
{  head  f- heads  •  (t(2]},  HS(t(2] }  }) 

), 

TS  4-  (?) 

(  new  s, 

for  all  t  from  PR  do 

(if  t [  1  ]  =  s  then  if  t[-l]  tails  then 
{  tails  *-  tails  ©  (t[-l]J,  TS(t[-l]}  }) 

}, 

PI  4-  P2  4-  NSY  list  (NSY  list  0) 

for  all  t  from  PR  do  for  all  i  from  2  to  (length  t)  -  1  do 
(  j  4—  t  [  i  ] ,  k4-  t(i+l],  heads  4-  tails  4-  (}, 

TS( j } ,  HS{k), 
if  P2(j](k]  =  fi  then 
(  P2[ j](k]  1, 

for  all  h  from  heads  do 
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if  P2[j3th]  =  n  then  P2(j][k]  «- 0  else 
(if  P2(j][h]  =  1  then  out  «-  out  ® 

"Conflict,  P2("  ®  Vtj]  ®  "It"  ®  V[h]  ®  "]"  «  cr), 

)  else  (if  P2[j][k]  =  0  then  out  *- out  ® 

"Conflict,  P2["  ®  V[j]  ®  "]["  «  V[k]  ®  "]"  ®  cr), 
if  Vlkl  €  VT  then  heads  «-  (k),  ‘Now  HST  in  heads' 
for  all  h  from  heads  do  (if  V[k]  €  VT  then 
(  if  Pl[ j][h]  =  n  then  Pl[j](hl  ♦-  1  else 
(if  Pl[j][h]  =  0  then  out  <-  out  ® 

"Conflict,  Pl("  ®  V[j]  «  "]["  ®  V[h]  ®  "]"  ®  cr), 

for  all  g  from  tails  do  if  PltgHh]  =  0  then  Pltglth]  «-  0  else 

(if  Pl[g][h]  =  1  then  out  <-  out  ® 

"Conflict,  Pit"  «  Vtgl  ®  "It"  ®  Vth]  •  "]"  ®  cr), 

} 

), 

uprow  «-  (l^ 
t  new  i  p, 

if  beenatrowk  A  i  =  k  then  fail  true, 
beenatrovd;  beenatrowk  v  i  «  k, 
for  all  j  from  1  to  kl  do 

(if  ftil  <  gtjl  then  if  ptiltj]  =  0  then  f[il  «-gUl  +  l), 
for  all  j  from  1  to  kl  do 

(if  l  fail  then  if  f[i]  >  g[j]  then  if  p[i][j]  =  1  then 
upcoltj,  (p)  p}  ), 

if  fail  then  out  =  out  ®  "row*  "  ®  V[i]  ®  cr 

)» 

upcol  ♦-  (p) 
t  new  j  p, 

if  beenatcolk  A  j  =  k  then  fail  ♦-  tru$, 
beenatcolk  <-  beenatcolk  v  J  =  k, 
for  all  i  from  1  to  k  do 

(if  ftil  >  gtj]  then  if  ptiltj]  =  1  then  g[j]  <-  ftil), 
for  all  i  from  1  to  k  do 

(if  fail  then  if  f[i]  <  gtj]  then  if  ptiltj]  *  0  then 
uprowti,  (?)  p)  ), 

if  fail  then  out  «-  out  ®  "col=  "  ®  V[  j]  ®  cr 
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fail  4-  false,  kl  4-  0 

f  «-  g  4- NSY  list  0,  'Allocate  storage  to  f  and  g* 
for  all  k  from  1  to  NSY  do  if  fail  then 
(  beenatrowk  4-  false,  f[k]  4-  g[k]  4-  1, 
uprowfk,  (?)  P2), 

kl  <-  k,  beenatcolk  4-  beenatrowk  4-  false, 
upcol(k,  (?)  P2) 

), 

out  4-  out  ®  "Linearized  functions  for  P2j"  ®  cr  ® 

(for  all  i  from  1  to  NSY  do  (i  base  10)  ®  tab  ®  V[i]  ® 

(f[i]  base  10)  ®  tab  ®  (g[il  base  10)  •  cr), 
fail  4-  false,  kl  4-  0 

for  all  k  from  1  to  NSY  do  if  "»  fail  then 
(  beenatrowk  4-  false,  f[k]  4-  g[k]  4-  1, 
uprow(k,  (?)  PI), 

kl  4-  k,  beenatcolk  4-  beenatrowk  4-  false, 
upcol{k,  (?)  PI) 

), 

out  4-  out  ®  "Linearized  functions  for  Pis"  ®  cr  ® 

(for  all  i  from  1  to  NSY  do  (i  base  10)  ®  tab  ®  V[i]  ® 
(f[i)  base  10)  ®  tab  ®  (g[i)  base  10)  ®  cr) 

)  'end  of  program' 
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The  Algol  version  follows  the  kernel  language  version  closely.  We 
have  taken  especial  care  to  minimize  conflict  in  memory  use  in  the  Algol 
version.  We  provide  three  globa]s  quantities,  MAXNPR,  MAXNSY,  and  MAXLPR 
which  determine  the  size  of  the  tables  in  the  program.  Within  the  system 
definition  block  (see  the  following  diagram)  we  define  the  global  arrays. 
Our  first  action  block  is  A,  where  the  data  cards  are  read  and  the  various 
tables  built.  In  block  B  we  check  that  the  tables  represent  a  grammar 
according  to  the  restrictions  of  the  theory.  In  block  Cl  the  recognition 
functions  are  computed  and  in  C2  the  linearization  is  completed. 

Block  structure  of  the  symbol  pair  analysis  program 
Outer  block  -  system  definition 

Global  quantities 

Block  A, 
grammar  input 

Block  B, 
grammar  checks 

Block  C 

Block  Cl 

Compute 

functions 

PI'  and  P2'  1 

Block  C2 
Compute 
functions 
flj  glj  g2 
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BEG IN  COMMENT  SYNTAX  PROCE  SiCR ,  A, 
INTEGER  mAXNSY)  COMMENT 

INTEGER  HMXNPRI  COMMENT 

INTEGER  HAXLPHI  COMMENT 

INTEGER  EITHER#  YES*  NO*  LE*  GT) 

INTEGER  Tl*  OTJ  COMMENT 

FILE  (JUT  CP  0(2*10)1 


M*  MCKEEMAN  OCT,  19651 
MAX  NUMBER  OF  SYMBOLS) 

MAX  NUMBER  or  PRODUCTIONS) 
MAX  LENGTH  OP  A  PRODUCTION) 

TIMING  INFORMATION) 


PROCEDURE  TIMER) 

PEGIN  INTEGER  T)  T  ♦  TlME(l)) 

NRITE(<’,TIME  «"»  F7.2*«*  TUTAL  ELAPSED  «  "*  F7.2*  «  MIN,">, 
(T-OT)/3eOO»  < T-T I J/J600 )) 

OT  ♦  T) 

FNO  TIMER) 

MAXNSY  ♦  300  MAXNPH  ♦  300)  MAXLPR  ♦  5) 

EITHER  ♦  0)  YES  *1)  NO  *  2)  LE  *1)  GT  ♦  2) 

CT  ♦  U  ♦  IlME(l)) 

BEGIN  COMMENT  SET  UP  GLUBAI.  TABLES) 

INTEGER  ARRAY  VO*  VUOIMAXNSYJ)  COMMENT  12  S I G  *  CHARS) 

integer  array  p«iu»maxnpr*uikaxlpr dcohmlnt  productions) 
boolean  array  ONRIGHIIOYMAXNSYT) 

integer  npri  cummem  actual  number  of  productions  reao) 
integer  NSYN*  NSY)  comment  actual  number  of  SYMBOLS  READ) 

BEGIN  comment  block  A)  COMMENT  CARD  INPUT  BLOCK) 

INTEGER  I*  J*  K*  L) 

LAREL  INPUTLUUH*  tUF*  FULNO) 

INTEGER  ARRAY  PO*  PttOIMAXNPR*  0  I  MAXLPR  3 1 

iNTEGtR  ARRAY  M T« l 0 IM A XNS Y T )  COMMENT  MASTER  TABLE) 

integer  array  pribioi  1022 d  comment  production  table) 

WRI fE(<"PHUUUC  I  IONS «"//>>) 

NPR  ♦  0) 
iNPUTLOOPl 

RE  Al)(  <  1 2  A6>  *  FUR  K  f  0  SUP  1  UNTIL  MAXLPR  00 
(PCfNPR  +  1*  M*  PllNPH*!*  KIDIEOFI) 

IF  POlNPHM*  li  «  "  "  THEN  RR!TE(<"  ">)  ELSE 

BEGIN  NPR  «■  NPR  ♦  D 

WNl  TE(<Ifl*X«»2A6»"  «■  "*  10A6>*  NPR* 

FOR  K  ♦  fl  STEP  1  UNTIL  MAXLPR  DO  { PCI NPR * K ] *  PI ( NPR# K  )]  )) 
END) 

GO  TO  INPUTlOUP) 

EOF  I 

NSY  ♦  01  VCICJ  ♦  V I C  C 1  ♦  «  "I 

FOR  K  ♦  C  STEP  1  UNTIL  "AXLPR  CO 
BEGIN  FOR  I  ♦  1  SUP  1  LET  1 1  NPR  PO 
BEGIN  FOR  J  ♦  0  STEP  1  UNTIL  NSY  1C 

IF  PC  1 1  *  N  T  *  VO  £  J  J  AND  P1II*K)  =  VltJ]  THEN 
GO  TO  FuONO) 

J  ♦  NSY  «■  NbY  ♦  1)  VOTJ]  ♦  PCf  I  *K  3 1  VltJ]  ♦  PltI*K)l 
FUONO  t 

PH  C I #  K  J  ♦  J) 

IF  K  *  0  ThLN  UNRiGHTIJJ  ♦  TRUE) 


1*0 


vi# 


LNU  I! 

If  K  i  (I  Ir-tK  NSYN  «•  K  is  Y  J 

eno  k; 

FOR  I  ♦  2  S  TtP  1  UNTIL  NPK  DO  IF  PRU»C)  ■  0  THEN 
PNII,0)  ♦  PK( 1* 1 »  0  )  J 

KHITECIPAGE  ))!  NKI1E<<"INTERMC0IAT£  SVMBCL S »">) I 
HhITEl<S(l8#Xj»2At)>#  FOR  1  ♦  1  STEP  I  UNTIL  NSYN  OC 

u#  vorn#  viini); 

HP  I  It  («//*•  1 1.  K ►  INAL  SY#-BILS»">)! 

HPIU(<5(  I6»X3#2A6)>#  FUR  I  ♦  N$YN*l  STEP  1  UNTIL  NS Y  DO 

u#  you  i#  vimni 

HPIlttCP#<,,F  ILL  VOt*J  MUH  0#"#  6  ("""#  A6#  "«"»  «  )/ 

(«(•••'*»,  A*,  *  I  ♦  1  STEP  1  UNTIL  NSY  QU  VOII))! 

HRITL(CP#<*,FUL  v  1 1 *  j  mITH  0#"#  6(mnntfi6tmmm’,,t")/ 
C8(,,,,,,#A6#,’*,"#"#,,)>>»FnH  I  «•  1  STEP  1  UNTIL  NSY  00  VIUDI 

L  «■  o; 

FOR  I  ♦  1  STEP  1  UNI IL  NSY  DO 
BEGIN  NTBUJ  *  LMJ 

FOP  J  ♦  }  SltP  1  UNTIL  NPR  CO  IF  PKId#l)  *  I  THEN 
BEGIN  FCJH  K  *  k  STEP  1  UNTIL  M  AXLf’R  00 

IF  Ph(d#Kj  f  0  THEN  PRT6(L  +  LM)*PR[J#K)! 

PR  T  H  l  L*l  ♦  1 J  ♦  -J!  PRTRll+LM)  ♦  PRTv#0)! 

LNU  J! 

PH  IfHL  +  L  +  l  1  ♦  o; 

END  1# 

MR  ITt iCP#  <*F  ILL  PPTb  [  *  1  KITH  C#"#  10(14#"#" >/ 

<"  "#14(  l<t#"»M>)>»  FUR  1  ♦  1  STEP  1  UNTIL  L  OC  PRTBCIJ)I 
MR  I TE (CP#  <"F  ILL  MlktC*)  WITH  "#  13(1  3#"#")/ 

("  w  #  1 7  C  13#**#"  ))>#U#F  UR  I  «•  1  STEP  1  UNTIL  NSY  00  MT8UJ)J 
HHITt(CP#<"NSY  ♦  W#I3,  ")  NSYN  ♦  ",  13#  "j  NPRTB  ♦  "#  I3,"!">, 
NSY#  N  S Y N #  L)J 
ENO  BLOCK  A; 


BEGIN  CUMMENT  BLOCK  Bi  COMMENT  GRAMMAR  CHECKS! 

INTEGER  If  J#  K! 

LABEL  OK! 

J  ♦  0  i 

f or  i  <-  i  step  i  until  nsyn  oo  if  not  onhighhi)  then 

BEGIN  J  ♦  J  ♦  1  i 

NN1TE(</mTUE  UmuUE  TARGET  SYMBOL  J  S  t  ",  ?A6>#  YO 1 1  I#V1|  ID! 
ENO  I » 

IF  J  *  1  THEN  WHI I tC<**THEME  IS  NO  UNIQUE  LEFTMOST  SYMBOL**)! 

FON  I  ♦  1  SI EP  I  UNTIL  NPR  DO 

BEGIN  COWMEN  t  CHECK  FOR  EMPIY  LEFT  ANO  RIGHT  PARTS! 

IF  PHt I#OJ  e  0  THEN  . 

NNIU(<*PKUUUCf  ION  ",  J#  "  PAS  AN  EMPTY  LEFT  PAHT">#I)! 

IF  PH C I # 1 J  ■  0  THEN 

WHITE  (<"PRUUUCTiON  ",  Jf  "  HAS  aN  EMPTY  RIGHT  PART"*#!)! 
FUR  j  ♦  !♦!  SUP  1  UNTIL  NPR  00 


BEGIN  COMMENT  ChtCK  f OR  IDENTICAL  RIGHT  PARTS! 

FUR  K  *  1  alLP  1  UKIIL  MAXl.PR  Oil  IF  PRM#KJ  A  PR[J#K}  THEN 

GO  T.J  UK > 

Wi<ITE(<*,HKllUUCUQNS  ".J#"  AND  J» 

"  MUST  DISTINGUISHED  TTY  The  INTERPRETATION  RUlES"># I# J)l 

UK  t 

EMj  Ji 

End  i; 

TIMER) 

ENU  BLOCK  H) 


REGIN  COMMENT  SlUCK  C)  comment  SYNTAX  ANALYSIS! 

alpha  array  pi#  parousY*  oinsy  oiv  241! 

COMMENT  PACKING  AND  UNPACKING  PROCEDURES! 

STREAM  PROCEDURE  SLT2BIISCW#  It  V)l  VALUE  I! 

BEGIN  01  <  »»!  21SKIH  I  DR)!  SI  «-  V!  SKIP  46  SB! 

2CIF  SB  THf  N  OS  *  SET  ELSE  DS  ♦  RESET  I  SKIP  SB!)! 

END  SLT2BITS! 

INTEGER  STREAM  PROCEDURE  <iET2BITS(W#  I)!  VALUE  I! 

BEGIN  D I  4-  LUC  GEI2UITS!  SKIP  46  OB!  SI  ♦  N!  2  C  SKI  P  I  SB!)! 

2( IF  SB  THEN  US  ♦  SET  ELSE  CS  ♦  RESET!  SKIP  SB!)! 

ENO  GET2R1TS! 

BEGIN  COMMENT  BLUCK  C  If  COMMENT  COMPUTE  PRECEDENCE  RELATIONS! 
IMEGER  ARRAY  HEADS#  T  T I  LS  C  0  I  N  S  Y  1 1 
integer  c#  m,  i,  j,  *,  l#  lc#  rc#  t#  div24#  moo24! 
boolean  FAIL! 

LAbEL  SKIPf'l#  SKIPP2#  DONE! 

PROCEDURE  HS(S)J  VALUE  S!  INTEGER  S! 

BEGIN  COMMENT  FI(\D  THE  LEFTMOST  SYMBOLS  OF  S! 

INTEGER  l#  J#  K) 

label  <#0  f  1 1  already  I 

FOR  i  ♦  l  SUP  1  UNTIL  NPR  00  If  PRCI#OJ  »  S  THEN 
BEGIN  K  ♦  PRI 1 # 1 J ! 

FUR  J  4-  i  STEP  1  UNTIL  LC  DO  IF  HEADSIJJ  «  K  THEN 
GU  TU  GUI 1TALRFADY! 

LC  ♦  LC  ♦  D  HLAUSILCJ  ♦  K!  HS(K)! 

GUT  1  I  ALREADY  < 

ENU  (! 

ENG  HS! 

PHUCEUUKE  IS(S)I  VALUE  S!  INTEGER  S! 

BEGIN  COMKLNI  FIND  The  RIGHTMOST  SYMBOLS  OF  SI 
INTEUEH  It  J#  K I 
LABEL  GiJ  T 1 TALHEADY#  Rl 

FUH  I  ♦  1  SUP  1  UNTIL  NPR  DO  IF  PRII#0J  ■  S  THEN 
BEGIN  FUR  J  4  MAXLPR  SUP  "1  UNTIL  1  00  IF  PR(1#G)  *  0  THEN 
GO  TO  hi 
Rt  K  4.  PR  1 1  #  J  J I 

FOR  J  *  1  STfP  t  UNTIL  RC  DO  IF  TAILSCJJ  »  K  THEN 
GO  TU  GUUTALREADYI 


h2 


RC  ♦  KC  ♦  tl  TMISCRC)  ♦  Kl  TSCK)I 
GOT  1 T  ALREADY I 
END  II 
ENO  TSI 

PROCEDURE  CUNFLICT(I#J#M>!  INTEGER  I# J#M| 

BEGIN  INTEGER  Cl 
TAIL  ♦  TRUE! 

RRITE(IN0)#<X29#,,/">)| 

HHITE(<"CUNH.ICT»  "#  2A6#  "  "#  Al*  "  AND  ”  A1»X2#2A6>» 
VOCIWVKIJ#  H#.S  V0[J)»V1(J1)I 
END  CONFLICT! 

FAIL  ♦  FALSE! 

FOR  I  ♦  I  STEP  1  UNTIL  NPR  DO  FOR  L  +  2  STEP  1  UNTIL  MAXLPR  DO 
BEGIN 

J  ♦  PHI  I #L"1  jl  K  ♦  PRt I »L II 

IF  K  >  0  THtN  GO  TC  OONEI 

0IV2A  ♦  K  OlV  24!  MQ024  ♦  K  MOO  24! 

LC  ♦  RC  ♦  0! 

T  S  C  J  > I  HS(ft)l 

T  ♦  GET2t*mcP2lJ«0lV24]»M0D24)l 
IF  T  «  YES  I  HEN  GO  TO  SKIPP2I 
IF  T  «  NO  THEN  CONFlICT(J*K#NN")  ELSE 
SET2BITSCP2I J»OI  V24  ]»MQ024» YES )! 

FOR  H  ♦  I  SI EP  1  UMIL  LC  00 

BEGIN  0IV24  ♦  hEAOS(H)  0 1 V  24!  M0024  ♦  HE ADS C H I  MOD  24! 

IF  GET2biTS(P2U*0lV24)»MOO24)  •  YES  THEN 
C0NFLICT(J»HEADS(H)#"N«)  ELSE 
SET2UlTS(P2tJ»DlV24)#K0024»N0)! 

END  HI 
SKIPP2I 

IF  K  >  NSYN  THEN 

BEGIN  COMMENT  IF  ”K"  IS  TERMINAL  HE  MUST  TABULATE  IT! 

LC  ♦  LC  ♦  II 
HEAOSILCJ  ♦  HI 
END! 

FOR  H  ♦  1  S 1  EP  1  UNTIL  LC  DO 

BEGIN  CUHMLNT  ONLY  TERMINAL  SYMBOLS  ARE  INVQLVEOI 
IF  HEAOSIH]  S  NSYN  THEN  GO  TO  SK1PP1! 

0IV24  ♦  HEAOSIH)  OIV  24!  M0024  ♦  HEAOSIH]  MOO  24| 

IF  GET2dnS(PlCj»0IV24]»MO024)  •  NO  THEN 
CONFLiCT(J»HEAO$IN]»NS")  ELSE 
SET2diTS(PlU»DIV24]»M0024*YES)! 

FOR  G  ♦  1  S I  EP  1  UNTIL  RC  00 

IF  GEf2tjlTSCPlITAlLS[G],0IV24]«M0D24)  •  YES  THEN 
CQNFLlCT(TAILS(G]»HEAOSm»"S”)  ELSE 
SE12HIT$(PltTAILSlG)*0IV24 J#M0D24#N0>! 

SRIPPil 
ENO  H! 

OONEI 
ENO  L  I! 

IF  NOT  FAIL  THEN  MRITE(</"NC  CONFLICTS  HERE  FOUNO^)! 

TIMER! 


END  BLOCK  C  II 


BEGIN  COMMENT  BLOCK  C  21  COMMENT  LINEARIZE  MATRICES! 
INTEGER  ARRAY  F#  G[OlN$YJl 
INTEGER  K*  All 

BOOLEAN  fail#  BELNATHCmK#  BEENATCOLKI 

•HOCEOURE  UPCUL(J#P)I  VALUE  Jl  INTEGER  Jl  ALPHA  ARRAY  PE0#0) J 
FORRARDI 

PROCEDURE  UPRUR ( I  #P )  I  VALUE  U  INTEGER  II  ALPHA  ARRAY  P[O#0Jl 
BEGIN  INTEGLR  Jl 

IF  BEENATHQuK  and  I  3  K  THEN  FAIL  ♦  TRUE  I 

BEENATRORK  ♦  BEENATRURK  CR  I  •  Kl 

FOR  J  ♦  l  SIEP  1  UNTIL  Kl  DO  IF  Fill  1  G C J 1  THEN 

IF  UC T 2B1 TS(P(  I » J  0 1 V  24I#ENTIER(J  MOD  24))  ■  GT  THEN 
FlIJ  ♦  GT J]  ♦  jl 

FOR  J  ♦  l  SIEP  1  UNTIL  Kl  00  IF  NOT  FAIL  THEN 
IF  FlU  >  Glj)  THEN 

IF  GEI2BITS(P[ I# J  DIV  24)#ENTIER(J  MOO  24))  ■  LE  THEN 
UPCULl J»  P)| 

IF  FAIL  THLN  RRlTE(<"R0w  ■  "#13#  "  ",  ?A6># I#VO(  I  J#V|CII)I 

ENO  UPRORI 


PROCEDURE  UPCUL(JfP)!  VALUE  Jl  INTEGER  Jl  ALPHA  ARRAY  P10»0JI 
BLtilN  INTEGER  i,  J0IV2A»  JM0024I 

IF  BEENATCOLK  ANO  J  *  K  THEN  FAIL  ♦  TROEI 

BEENATCOLK  ♦  BEENATCOLK  CR  J  >  Kl 

JDIV24  ♦  J  U1V  24 1  JM0024  ♦  J  MOD  241 

FOR  I  ♦  1  SIEP  1  UNTIL  K  00  IF  Ft  1 ]  >  6[J1  THEN 

IF  GET2BiTS(Pl  I#  JDIV24)#  JM0024)  ■  LE  THEN  GlJ]  ♦  FUJI 
FOR  I  ♦  l  STEP  1  UNTIL  K  00  IF  NOT  FAIL  THEN 
IF  F 1 1 )  S  Glj)  THEN 

IF  GEI2BI TSCFl  I# JOI V24  ), JM0024 )  ■  GT  THEN 
JPHJHl I ,  P)t 

IF  FAIL  THEN  RR1TE(<"C0L  ■  ",I3,  « 

ENO  UPCOLI 


'»  2A6>#J#V0CJJ#VllJ ))l 


FAIL  ♦  F ALl-ii  Kl  ♦  01  RRITEC  [PAGE])! 

FOR  K  ♦  l  STEP  l  UNTIL  NSY  00  IF  NUT  FAIL  THEN 
BEGIN  BEtMTHUNK  ♦  FALSE!  F  l K  ]  ♦  G[K)  ♦  II 
UPR0R(K*P2)I 

Kl  ♦  Kl  dELNAlCOLK  ♦  BEENATRORK  ♦  FALSE! 

UPC0L(K«P2J! 

ENO  Kl 

IF  FAIL  THiN 

RITt(<"LlNLARIZATlTJN  FAILURE  FOR  FUNCTIONS  BELGW">)J 
RRITE(<nL1NEARIZEU  PRCUUCTICN  RECOGNITION  MATRIX!"/ 
X7#"N0t"»X9#"SYMBUL,,#Xl0»"F"#X7#"U"/(IlQ,X6#2A6#2IG)># 


FU«  K  ♦  1  STEP  l  UNTIL  NSY 
Rhi T£(CP#R"F  ILL  F2t*J  WITH 
FUR  K  ♦  I  STEP  l  UNTIL  NSY 
RRITE<CP#<"FILL  G2[*J  WITH 
fUrt  K  ♦  l  STEP  l  UNTIL  NSY 


00  [K#VOtK]«VltK)#FCK)»G[K)))J 
0»",  16(!2*"»")/(24(I2*"f"))># 
00  FIK))I 

0>n»  1A(I2*"»")/(24(I2»",«))>, 
00  G[K  ] )! 


FAIL  ♦  FALUJ  K 1  ♦  01  TIMER)  NRK  TE(  (  PAGE  1)1 
FOH  K  f  1  51EP  1  UNTIL  NSY  00  IF  NOT  FAIL  THEN 
BEGIN  ULLMATKUMK  ♦  FALSE)  F[K]  ♦  (,[K)  ♦  1) 

UPHUH(*#P1  )> 

K1  «>  K)  HELNATCOLK  +  REENATHOWK  ♦  FALSE) 

UPC0L(K#P1)) 

ENU  K) 

IF  fail  then 

WRlIE(<BLINtANIZAl IUN  FAILURE  FOR  FUNCTIONS  BELO*">>) 

nhitec<"lineahized  hiehahchy  analysis  matrix*"/ 

X7#"i»0i%XV»wSYHBUL"»X10#"Fm»X7«nGn/(I10»X6»2A6»2IB)>» 
FOH  K  f  1  SIEP  t  UNTIL  NSY  00  IK#VO(K)»VltK]#F(K]#C[K]])) 
NHITE(CP#<hHLL  FU*1  NITH  0*"#  |<J(I2#"#")/(2ACI2#"»"))># 
FOH  K  ♦  I  STEP  1  UNTIL  NSY  00  FfKJ)l 

NHITE(CP#<wFILL  G1 C  *  1  N I TH  0#"#  18( 1 2» "# * ) / C 24  (  1 2,  ) )># 

FOH  K  ♦  1  srr.P  X  UNTIL  NSY  00  G FK 1 )) 

ENO  BLOCK  C  2) 

ENO  BLOCK  C) 

ENO) 

T1PER) 

ENO* 


(2,l)(l,2)  Parsing  Fund  ions 


Consider  the  grammar 
P  =  [G  ->  AB,  B-»  BC,  B  -»  C) 

The  symbol  B  is  left  recursive ,  that  is,  B  £  HS(B),  From  the  first 
production  we  can  derive  a  conflict  in  P2'.  Similiarly,  if  A  had  been 
right  recursi ve ,  we  would  have  had  a  conflict  in  PI'  from  the  first 
production.  We  can  sum  up  both  situations  by  saying  that  an  internal 
recursion  will  always  cause  a  conflict.  Note  that  the  grammar 
P  -  (G  -»  AB' ,  B'  -»  B,  B  ->  3C,  B  — *  C) 

has  no  internal  recursion  and  is  a  symbol  pair  grammar.  While  we  must 
reject  arbitrary  gr  air  .mar  transformations  on  semantic  grounds,  the  inser¬ 
tion  of  a  dummy  production  does  not  affect  the  semantic  interpretation  of 
the  language.  The  reader  will  note  several  such  dummy  productions  in  the 
grammar  of  our  kernel  language. 

We  would  like  to  extend  the  range  of  our  grammars  without  requiring 
additional  work  by  the  programmer.  It  is  perfectly  feasible  to  test  for 
internal  recursions  and  automatically  insert  dummy  productions  into  the 
grammar  prior  to  starting  the  analysis  of  the  syntax 

A  perhaps  more  hopeful  approach  is  to  extend  the  View  of  the  func¬ 
tions  FI1  and  p 2  It  happens  that  internal  recursions  are  allowed 
if  we  look  left  one  extra  symbol  for  Pi  and  right  one  extra  symbol  for 
P2  Extending  the  notation  of  Wirth  and  Weber  ([25  ‘  p  52)  we  call  the 
symbol  pair  grammars  (L,l)(l,l)  canonical  parse  grammars  and  the  suggested 
extension  (2,l)(l,2)  canonical  pars^  grammars 

A  (2,l)(l,2)  syntax  preprocessor  is  considerably  more  complicated 
than  that  ror  a  (l,l.)(l,l)  grammar.  In  particular,  the  defining  matrices 
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for  P1"(X,Y,Z)  and  P2"(X,Y,Z)  contain  NSY  cubed  elements.  Even 
though  the  density  of  defined  entries  is  on  the  order  of  one  percent, 
a  moderately  large  grammar  may  require  10,000  entries.  It  is  encouraging 
to  note  that  no  naturally  occur ing  grammar  has  failed  to  be  a  (2,l)(l,2) 
grammar. 

The  rules  for  deriving  the  values  of  PI"  and  P2"  are  similar 
to  those  for  deriving  PI*  and  P2'.  We  will  first  tabulate  the  various 
set  definitions  required  for  the  derivations  and  then  state  the  rules  de¬ 
rived  from  certain  standard  production  formats. 

( (X,Y)  |  (3u)(3R)  P  -*  uXR,  R  =£>  Y)  U 
{ (X,Y)  |  (3u)(3R)  P  -*uR, 

(X,Y)  €  T2S(R)} 

{ (X,Y)  |  (3u)(3R)  P  -*  XRu, 

(Y  =  R  or  Y  €  HS(R)))  U 
( ( X, Y)  |  (3u)(3R)  P  -»  Ru, 

{X, Y)  €  H2S(R)} 

(X  |  (3Q)(3R)(3x)(3y)  R  -*  xXQy, 

(P  =  Q  or  P  €  HS(q) )} 

(X  |  (3Q)(3R)(3x)(3y)  R  -»  xQXy, 

(P  =  Q  or  P  €  TS(Q))} 


Set  definitions. 

canonical  parse 
tail  two  symbols 
T2S(P) 


Canonical  parse 
head  two  symbols 
H2S(P) 


Allowed  predecessors 
AP(p) 


Allowed  successors 
AS(p) 


Derivation  rules  for  the  parsing  function  values. 

PI" 


W  -*  UVv 

(3X)(3Z)  Z  €  HSt(V),  X  €  AP(W)  implies 
P1"(X,U,Z)  =  false. 

W  ->  UVv 

(3X)(3Y)(3Z)  Z  €  HSt(V),  X  €  AP(W), 

U  -»=e>  Y  implies  P1"(X,Y,Z)  =  true. 

W  -4  UVv 

(3X)(3Y)(3Z)  Z  G  HSt(V),  [X,Y]  €  T2S(U) 
implies  P1"(X,Y,Z)  =  true. 

W  -»  tTUVv 

(3Z)  Z  G  HSX(V)  implies  P1"(T,U,Z)  ^  false. 

W  -*  tTUVv 

(3Y)(3Z)  Z  G  HST(V),  U  -»^>  Y  implies 
P1"(T,Y,Z)  =  true. 

W  ->  tTUVv 

(3X)(3Y)(3Z)  Z  G  HSt(V),  {X,Y}  G  T2S(u) 
implies  P1"(X,Y,Z)  =  true. 

P2" 

W  -»  tTU 

(3S)  S  G  AS(W),  (3Z)  Z  G  HST(S)  implies 
P2"(T,U,Z)  =  false. 

W  -♦  tTU 

(3S)  S  G  AS(W),  (3Y)(3Z)  Z  G  HST(S), 

U  Y  implies  P2"(T,Y,Z)  =  true. 

W  -»  tTU 

(3Y)(3Z)  (Y,Z)  G  H2S(u)  implies 

P2"(T,Y,Z)  =  true. 

W  -» tTUVv 

P2"(T,U,V)  =  false. 

W  -4  tTUVv 

(3Y)(3Z)  Z  G  HSt(V),  U-»=£>Y  implies 
P2"(T,Y,Z)  =  true. 

W  -» tTUVv 

(3Y)(3Z)  (Y,Z)  G  H2S(u)  implies 

P2"(T,Y,Z)  =  true. 
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The  block  structure  of  the  (2,l)(l,2)  preprocessor  is  similiar  to 
that  of  the  symbol  pair  preprocessor.  We  organize  the  set  definitions 
for  allowed  predecessors,  allowed  successors,  single  character  derivatives 
(Y  -*=S>  Z),  head  symbols  and  tail  symbols  as  Boolean  matrices.  If,  for 
example,  AP[l,j]  «  true  then  symbol  number  J  is  an  allowed  predecessor 
of  symbol  number  I.  We  gain  by  avoiding  table  look  ups  and  loose  by 
being  forced  to  pack  the  matrices.  The  blocks  Cl,  C2,  and  C3  contain 
relatively  transparent  algorithms  for  the  computation  of  the  five  sets. 

Block  Ch  delineates  the  algorithm  for  computing  the  function  PI". 
Consider  an  arbitrary  canonical  derivation  Y  t  where  t  €  V£  . 

For  every  intermediate  stage  of  the  derivation  (such  that  it  has  at 
least  two  symbols)  the  pair  of  rightmost  two  symbols  of  the  produced  string 
are  an  entry  in  the  canonical  parse  tail  two  symbols  of  Y.  The  procedure 
T2S  tabulates  pairs  of  tail  symbols  over  all  possible  derivations  emana¬ 
ting  from  its  argument.  Storage  requirements  force  us  to  abandon  the 
Boolean  matrix  definition  for  these  sets  and  we  tabulate  them  in  a  linear 
array.  It  is  also  infeasible  to  record  the  values  of  PI"  in  a  three 
dimensional  matrix  hence  we  record  the  values  in  four  linear  arrays,  the 

i 

first  three  giving  the  coordinates  of  the  point  and  the  fourth  its  value. 

At  the  innermost  loop  of  the  analysis  (nested  within  four  FOR's  and  five 

IF's)  we  find  a  call  on  procedure  ENTER  which  records  the  computed  value. 

Since  the  speed  of  execution  of  the  algorithm  is  proportional  to  the  speed 

of  ENTER,  we  have  attempted  to  code  it  efficiently.  The  first  implication 

is  the  need  for  a  binary  table  look  up  which  itself  demands  that  the  three 

coordinate  arrays  be  packed  in  a  single  word  as  the  polynomial  value 
2 

I  X  N  +  JXN  +  K  where  N  >  NSY.  Secondly  we  use  even  powers  6f  two 
and  Burroughs  B5500  partial  word  operators  instead  of  multiplies  and  divides 


as  indicated  in  the  comments. 


In  block  C5  we  find  similiar  algorithms  to  compute  P2".  As  one 
immediately  sees  by  inspecting  the  output  from  a  trial  lim  on  the  pages 
following  the  program,  even  a  small  grammar  generates  an  enormous  number 
of  relations.  The  number  is  so  large  that  we  have  been  unable  to  test 
the  program  for  large  grammars.  Yet  we  feel  that  the  information  in  the 
tables  is  highly  redundant  leading  us  to  conjecture  the  existence  of  some 
analogue  to  Floyd's  f  and  g  functions  for  condensing  the  information. 
To  date  we  have  not  been  able  to  find  a  reliable  algorithm  for  this 
purpose. 

Our  inability  to  condense  the  definitions  of  PI”  and  P2"  into 
reasonably  compact  tables  is  the  only  bar  to  their  use  in  the  syntactic 
analyzer  of  the  compiler.  It  appears  that  (2,l)(l,2)  grammars  are  suff¬ 
iciently  powerful  to  describe  computer  languages  with  no  further  generali¬ 
zation.  There  would  be  some  advantage  in  generalizing  the  function  P3' 
to  allow  repeated  and  empty  right  parts  in  the  production  tables. 

The  sample  output  has  been  slightly  rearranged  from  the  actual 
computer  output.  The  first  page  contains  listings  of  P,  Vjj,  V^,  and 
G.  Then  follow  the  definitions  of  the  five  sets.  The  left  margin  contains 
the  symbol  number  and  name;  the  top  margin  the  least  significant  digit  of 
the  symbol  number.  A  dot  signifies  that  the  symbol  numbered  in  the  top 
margin  stands  in  the  indicated  relation  to  the  symbol  in  the  left  margin. 
For  example,  EOF  is  in  the  head  of  <FR0GRAto>. 

The  first  tabulated  value  for  PI”  indicates  that  <EXPR>  ELSE  IF 
is  an  expected  triplet  and  that  IF  is  not  to  be  moved  from  £  to  x  in 
the  general  parsing  algorithm  (because  <EXPIE>  ELSE  must  first  form 
<TRUEPART> ) . 


50 


*y  1*' 


begin  comment 

INTEGER  MAXNSYI 
integer  MAXnPR) 


<2#1)(1#2>  SYNTAX  PROCESSOR  MCKEEMAN  JAN.  19661 

comment  max  number  or  symbols) 
COMMENT  max  number  or  PRODUCTIONS) 


INTEGER  MAXLPR) 

INTEGER  TI#  OT#  T) 
INTEGER  P2CSAVE#  Si*  I) 
REAL  ARRAY  REC0RDC0I203) 
DEriNE  PACKEO  ■  ALPHA#) 


COMMENT  MAX  LENGTH  OP  A  PRODUCTION) 
COMMENT  TIMING  INPORMATION) 

COMMENT  STATISTICS  STORAGE) 


PROCEDURE  TIMER) 

BEGIN  OT  «■  T)  T  «■  TIMEC1)) 

HRITE(<mTIME  »"#  P7.2#"#  TOTAL  ELAPSED  *  "#  P7.2#  "  MIN."># 
(T-0T3/3600#  (T-TD/3600)) 

END  TIMER) 

PROCEDURE  SAV(X))  VALUE  X)  REAL  X)  RECORD! SI+SI+1 )  *  X) 


MAXNSY  «•  300)  MAXNPR  «•  300)  MAXLPR  »  5) 

P2CSAVE  ♦  SI  ♦  0) 

T  «■  TI  «■  TIMECD) 

WRITE! <"( 2# 1 ) ( 1#2 )  SYNTAX  PROCESSOR#  MCKEEMAN#  JAN.  1966"//>>) 


BEGIN  COMMENT  SET  UP  GLOBAL  TABLES) 

INTEGER  ARRAY  VO#  VltOSMAXNSYl)  COMMENT  12  SIG.  CHARS) 
INTEGER  ARRAY  PRCOIMAXnPR*  0 1 MAXLPR) )  COMMENT  PRODUCTIONS) 
BOOLEAN  ARRAY  ONRIGHTCOtMAXNSY]) 

INTEGER  NPR)  COMMENT  actual  NUMBER  or  productions  read) 

INTEGER  NSy#  NSYN)  comment  ACTUAL  number  or  SYMBOLS  REA0) 

BEGIN  COMMENT  BLOCK  A)  COMMENT  CARD  INPUT  BLOCK) 

INTEGER  I#  J#  K) 

LABEL  INPUTLOOP#  EOE#  POUND) 

INTEGER  ARRAY  PO#  PI  10 1 MAXNPR#  OIMAXLPRJ) 

WRlTE(<"PROOUCT IONS !"//>)) 

NPR  «■  0) 

INPUTLOOP! 

REAOC <1 2A6>#  POR  K  *  0  STEP  1  UNTIL  MAXLPR  DO 
IPOINPRM#  K ) #  P1CNPRM#  K3)>!E0P)) 

IP  POtNPRfl # 1 3  ■  "  "  THEN  HRITE(<h  h>)  ELSE 

BEGIN  NPR  «•  NPR  ♦  1) 

WRITEC<Ifl#X8#2A6#"  ♦  "#10A6»#  NPR# 

POR  K  «■  0  STEP  l  UNTIL  MAXLPR  DO  [P0[NPR»K3»  P1!NPr#K33>) 

END) 

GO  TO  INPUTLOOP) 


EOPI 

NSY  ♦  0)  V0C03  ♦  V1C03  *  "  ") 

POR  K  ♦  0  STEP  1  UNTIL  MAXLPR  00 
BEGIN  FOR  I  «■  1  STEP  1  UNTIL  NPR  DO 
BEGIN  POR  J  ♦  0  STEP  1  UNTIL  NSY  DO 

IP  P0CI#K3  «  V0IJ3  AND  P t C I #K 3  ■  V 1 C J 3  THEN 
GO  TO  POUND) 

J  «■  NSY  ♦  NSY  ♦  1) 

VOtNSYl  ♦  P0!I#K3)  V1CNSY3  ♦  P1!I#K3) 
POUND! 
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pr c  t  * k  ]  »  j; 

IF  K  t  0  THEN  ONRIGHTCJl  *  TRUE; 

END  II 

if  k  «  o  then  nsyn  «■  nsy;  comment  still  in  intermediate  sym; 
end  k; 

FOR  I  ♦  2  STEP  1  UNTIL  NPR  DO  IF  PR[I#Ol  ■  0  THEN 
PRt  I  #0]  «•  PRC  I-1#0JI 

WrITEC  CPAGEl  )f  WRITE(<MINTERMEDIATE  SYMBOLS  I ">)) 
WRlTE(<3n8,x3,2A6)>#  FOR  I  ♦  1  STEP  1  UNTIL  NSYN  DO 

ci#  vocn#  vidin; 

WRITE(<//»*TERMINAL  SYMBOLS t M> ) J 

WRITE(<3(ie»X3,2A6)>#  FOR  I  ♦  NSYN+1  STEP  1  UNTIL  NSY  00 

[i#  vom#  vitim; 

COMMENT  GATHER  STATISTICS; 

SAV(NPR)!  SAV(NSY);  SAV(NSYN)! 

END  BLOCK  Al 


BEGIN  COMMENT  BLOCK  B!  COMMENT  GRAMMAR  CHECKS) 

INTEGER  ARRAY  TESTCO»NPR#  OlMAXLPRJI 
BOOLEAN  CHANGE#  EMPTY) 

INTEGER  I #  J#  K#  Z) 

LABEL  OK) 

J  *  0! 

FOR  I  «■  l  STEP  1  UNTIL  NSYN  DO  IF  NOT  ONRJGHT Cl)  THEN 
BEGIN  J  ♦  J  ♦  1 ) 

HRITE<</mTHE  UNIQUE  TARGET  SYMBOL  ISI  "#  2A6>*  V01I3#  Vltll)) 
END  I) 

IF  J  t  1  THEN  NRITE(<MTHERE  IS  NO  UNIQUE  LEFTMOST  SYMBOL">>) 

for  i  *i  step  i  until  npr  do  for  j  «■  o  step  i  until  maxlpr  on 

TESTCI.J1  *  IF  PRt  I  #  J]  >  NSYN  THEN  0  ELSE  PRtl.JJ) 

CHANGE  ♦  TRUE) 

WHILE  CHANGE  00 
BEGIN  CHANGE  +  FALSE! 

FOR  I  ♦  1  STEP  1  UNTIL  NPR  DO 
BEGIN  Z  TESTt  i#oi; 

IF  Z  t  0  THEN 
BEGIN  EMPTY  ♦  TRUE; 

FOR  J  *  1  STEP  1  UNTIL  MAXLPR  00 
EMPTY  *  EMPTY  AND  TEST C  I # J J  ■  0; 

IF  EMPTY  THEN  FOR  K  «■  1  STEP  1  UNTIL  NPR  DO 

FOR  J  ♦  0  STEP  1  UNTIL  MAXLPR  DO  IF  TESTtK# J)  ■  Z  THEN 
TESTtK# J)  ♦  0; 

CHANGE  «■  CHANGE  OR  EMPTY! 

END! 

end; 

end  CHANGE; 

FOR  l  «■  1  STEP  1  UNTIL  NPR  DO  IF  TESTCI#01  t  0  THEN 

WRITE(<HPROOUCTION",  14#  "  LEADS  TO  A  NON-TERM INAT ING  PHRASEm># 

i>; 


FOR  I  «■  1  STEP  1  UNTIL  NPR  DO 

BEGIN  COMMFNT  CHECK  FOR  EMPTY  LEFT  AND  RIGHT  PARTS! 

IF  PRC  1*0)  *  0  THEN 

WR I TE  (  <M  PRODUCT  ION  "*  J*  "  HAS  AN  EMPTY  LEFT  PARTM>*I>! 

IF  PR  C I  *  1 )  «  0  THEN 

WRITE(<MPROOUCTION  "*  J*  "  HAS  AN  EMPTY  RIGHT  PART">#I)! 

FOR  J  «•  I  +  l  STEP  1  UNTIL  NPR  DO 

BEGIN  COMMENT  CHECK  FOR  IDENTICAL  RIGHT  PARTS! 

FOR  K  ♦  1  STEP  1  UNTIL  MAXLPR  DO  IF  PRtI#K)  4  PRCJ*K)  THEN 
GO  TO  OK! 

HRITEC<”PROnUCTinNS  "*J#"  ANO  "*  J# 

"  MUST  BE  DISTINGUISHED  by  the  INTERPRETATION  RUlES">, I, J)! 

OKI 

END  J! 

END  I! 

TIMER! 

WRITEC  CPAGE] )! 

END  BLOCK  B! 


BEGIN  COMMENT  BLOCK  Ci  COMMENT  SYNTAX  ANALYSIS! 

PACKED  ARRAY  CR[0ll0?2)!  COMmFNT  COORDINATES! 

INTEGER  ARRAY  Si*  S2C0I1022)!  COMMENT  NSY*2! 

BOOLEAN  ARRAY  VCOI10221! 

PACKED  ARRAY  INHEAO*  I  NT A IL C 0 1 NSY*  OlNSY  DIV  483! 

PACKED  ARRAY  SCOCOINSY*  0 1  NSY  DIV  48]! 

PACKED  ARRAY  AP*  ASCOtNSY*  OlNSY  DIV  43)! 

BOOLEAN  ARRAY  BEEnTHERECOinSY)! 

INTEGER  NVAL*  P2C *  FINDS! 

boolean  stream  procedure  getoitca*  1)1  value  I! 
begin  si  «■  a!  skip  i  so!  tally  «■  u 

IF  SB  THEN  GETrIT  ♦  TALLY! 

END  GETBIT! 

STREAM  PROCEDURE  SETBITCA*  I)!  VALUE  I! 

BEGIN  DI  ♦  A!  SKIP  I  DB!  DS  ♦  SET! 

END  SETBIT! 

PROCEDURE  EnTERCI*  J*  K*  X)!  VALUE  I*  J*  K*  X! 

INTEGER  I*  J*  K!  BOOLEAN  X! 

BEGIN  LABEL  BINaRYLOOKUP*  GOTlTALREADY! 

INTEGER  B*  M*  T#  H*  NH,  L! 

IF  NVAL  *  1022  THEN 

BEGIN  writec<"too  many  analysis  FUNCTION  VALUES">>! 

NVAL  »  1! 

TIMER! 

END! 

COMMENT  WE  PACK  COORDINATES  BOTH  FDR  STORAGE  ECONOMY  AND 
SPEED  IN  THE  BINARY  LOOKUP  FOR  INSERTION! 

R  «•  0!  T  «■  NVAL!  H  «■  K4  JC  24  I  36 1 12)41 112 1  3ft  1 1 2  )  ! 

COMMENT  H  IS  THE  COORDINATES  AS  POWERS  OF  2*10! 
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BINARYLOOKUPi  M  «■  (B+T),t3A»ll)l  comment  OIV  2 # 

NH  «•  CRtMi; 

if  nh  <  h  then  b  «■  m  else 
if  nh  >  h  then  t  ♦  m  else 

begin  IF  NOT  (X  EQV  VtM))  THEN  WRI  TF.<  <"CDNFLICT#  "#  6A6># 

votn#virn#  voui#vitji#  vock3#vick3>i 
FINOS  «■  FINOS  ♦  II  comment  FOR  STATISTICS# 

GO  TO  GOTITALREAOYI 

end; 

IF  B+l  i  T  THEN  GO  TO  BINARYLOOKUPI 
FOR  L  *  NVAL  STEP  -1  UNTIL  T  DO 
BEGIN  CRCL+1)  «■  CRCLil 
VCL  +  11  *  VfLJI 
ENO  LI 

CHIT]  «•  HI  VCT3  «■  VI 
NVAL  ♦  NVAL  ♦  II 
GOTITALREAOY* 

ENO  ENTFRI 

PROCEDURE  PUT(X#  Y )  #  VALUE  X#  Y I  INTEGER  X#  Y| 

begin  comment  enter  a  head  or  tail  pair  into  list# 

INTEGER  II  LABEL  GO T I T ALREADY  I 
FOR  I  «■  1  STEP  1  UNTIL  P2C  DO 
IF  SICI]  *  X  THEN 

IE  S2CII  c  Y  THEN  GO  to  gdtitalready# 

P2C  P2C  +  II 

IF  P2C  >  1022  Then  WRITE(<mT00  many  pairsh>># 

COMMENT  WE  SAVF  P2C  FOR  STATISTICAL  ANALYSIS# 

IF  P?C  >  P2CSAVE  THEN  P2CSAVF  «■  P2C # 

S1CP2C]  ♦  Xi  S2CP2C]  ♦  Y I 
GOTlTALRFAOYt 
END  PUT# 

PROCEDURE  PRINTMATRIX(TITLE#  M)|  FORMAT  TITLE# 

PACKED  ARRAY  MtO#0)l 

BEGIN  COMMENT  PRINT  A  BOOlFAN  MATRIX# 

INTEGFR  I#  JJ 
WRITE(TITLE)# 

WRITF(<X9»  "SYMBOL"#  X«j#  100H># 

FOR  I  ♦  1  STEP  1  UNTIL  NSy  DO  I  MOD  10)# 

FOR  I  «■  1  STEP  1  UNTIL  NSY  DO 

WRITEf<l3#  X  3#  ?A6#  X2#  100Al>#  I#  VOCIJ#  VltD# 

FOR  J  ♦  1  STEP  1  UNTIL  NSY  DO 

IF  GETBITCMCI»J  OIV  AB]»ENTIERCJ  HOD  48)>THEN  FLSE  "  ")! 

TIMER! 

WRITEC [PAGE))! 

END  PRINTMATRIXI 

PROCEOURE  TABUlATE(n>I  VALUE  N|  ALPHA  Nl 
BEGIN  COMMENT  PRINT  VALUES  PF  Pl"(X#Y#Z)  AND  P2"(X#Y#Z)I 
INTEGER  I #  Cl#  C2#  C3I 
FOR  I  «•  1  STEP  1  UNTIL  NVAL  00 
BEGIN  Cl  «•  CRt  I).Ct2»  121!  C2  «■  CRCI 1 .  C24  « 12  J I 
C 3  «•  CRlI).I36tl2)l 

WRITE(<A?#"""#"<  "#6A6#")  *  "#A2#""","(",2< 13#"#")# 13# 
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">  e  M»  L5># 

N#  V0tcn#vncu.  VOCC?3#VlCC23#  VOt  C  3  3  #  V 1 C  C  3  )  # 

N,  Cl#  C2#  C 3#  VtlDJ 

end; 

WRITE(<I5#  "  FUNCTION  VALUES#  OENSITY  ■%  F6» 2#  *♦*"# 

"#  ENTRIES/VALUEh#  F6.2># 

NVAL#  100kNVAL/NSY*3#  CFINOS+NVAL3/NVAL  )3 

timer; 

SAV((T-0T)/3600>;  SAV(NVAL); 

mriteccpaged; 
end  tabulate; 

crioi  «■  o;  comment  a  safe  "bottom"  for  the  binary  lookup; 

BEGIN  COMMENT  BLOCK  C  U  COMMENT  HEAD  AND  TAIL  OCCURENCES; 
COMMENT  I NHE ADC  I #  J 1  IMPLIES  J  IS  IN  THE  HEAD  OF  H 
INTEGER  I#  j; 

PROCEDURE  HS(S);  VAlUE  S;  INTEGER  S; 

BEGIN  COMMENT  FIND  ALL  THE  HEADS  OF  S; 

integer  i#  j#  z; 

IF  NOT  GETBIT(INHEADCS#S  OlV  48),ENTIER(S  MOD  48))  THEN 
BEGIN  SETBlT(INHEAOtS#S  DIV  4R3#ENTIERC5  MOD  48))) 

FOR  I  ♦  l  STEP  1  UNTIL  NPR  00  IF  PRtI#0)  -  S  THEN 
BEGIN  Z  ♦  PRCI#n; 
hs(Z); 

FOR  J  ♦  l  STEP  1  UNTIL  NSY  DO 

IF  GETpi T( I NHE AOtZ# J  OlV  48]#ENTlER(J  MOD  48))  THEN 
SETBIT(INHFA0CS,J  OlV  48),ENTIER(J  MOD  48))| 

ENO  i; 

end; 

END  HS; 

PROCEDURE  TS(S);  VALUE  Si  INTEGER  S; 

begin  comment  find  all  the  tails  of  s; 
integer  i#  j#  z; 

LABEL  F; 

IF  NOT  GeT8IT(INTaILCS#S  oJV  48J#ENTIeR(S  MOD  48))  THEN 
BEGIN  SETBITC INTAILCS#S  OlV  4B3#ENTIER<S  MOD  48))| 

FOR  I  «•  1  STEP  l  UNTIL  NPR  00  IF  PRCI#0)  «  S  THEN 
BEGIN  FOR  J  «•  MAXLPR  STEP  -l  UNTIL  l  00 
IF  PRCI#  J)  X  0  THEN  GO  TO  FI 
Ft  z  ♦  PRCI*Ji; 
tscz); 

FOR  J  «■  l  STEP  l  UNTIL  NSY  DO 

IF  6FTBIT( INTAILCZ, J  OIV  48),ENTIER<J  MOD  48))  THEN 
SETBITC  INTAILCS# J  OIV  4B]#ENTIER<J  MOO  48)); 

END  i; 

end; 

END  TS; 

FOR  I  ♦  0  STEP  l  UNTIL  NSY  00  FOR  J  •«  0  STEP  l  UNTIL  NSY  DIV  48 
00  I  NHE  ADC  I  #  J  3  «•  I  NT  AIL  1 1 » J  3  ♦  0; 

FOR  I  ♦  l  STFP  1  UNTIL  NSY  00 

begin  hscd;  tscd; 
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T>4 


V*' 


END! 


PR !NTMAT«IX(<//mINhE AO i»»/>*  INHEAO); 

printmatriX(<//« intail tM/>#  intail)! 
end  block  cif 


begin  comment  block  c  21  comment  single  character  derivatives* 
comment  sent i , j 3  implies  that  j  is  a  single  character 

DERIVATIVE  OF  II 
INTEGER  !»  J»  K* 

BOOLEAN  CHANGE! 

for  i  ♦  0  step  t  until  nsy  do 

FOR  J  «■  0  STEP  1  UNTIL  NSY  DIV  48  DO  SCOtI#J3  «■  0* 

FOR  I  ♦  1  STEP  1  UNTIL  NPR  00 
IF  PRt It?)  =  0  THEN 

SETBlT(Sc0tPPtI#03>PRf I»1J  DIV  483 , ENT IER(PR t I# 13  MOO 
4P  ))! 

CHANGE  «■  TRUE* 

WHILE  CHANGE  DO 
BEGIN  CHANGE  «•  FALSE! 

FOR  I  ♦  1  STEP  1  UNTIL  NSYN  DO 
FOR  J  «■  1  STEP  1  UNTIL  NSYN  DO 

IF  GETR I T( SCOl I # J  DIV  4B3»ENTIER( J  MOD  48 ) )  THEN 
FOR  K  «■  1  STEP  1  UNTIL  NSY  00 

IF  GFTBITtSCDt J#K  DIV  483»ENTIER(K  MOD  48))  THEN 
IF  NOT  GETniT(SCDCI*K  DIV  483#ENTIER(K  MOD  48)) 
THEN 

BEGIN  CHANGE  TRUE* 

SETBIT(SCDII#K  DIV  483,ENTIER(K  MOD  48))* 

END* 

END  CHANGE! 

PRINTMATRIX(<//HSINGLE  CHARACTER  DERIVATIVES  SCO)! 

END  BLOCK  C  2! 


begin  comment  block  c  comment  predecessors  ano  successors! 
comment  Apri*j)  implies  j  ts  an  allowed  predecessor  nF  i* 

INTEGFR  I#  J#  P! 

FOR  I  ♦  0  STEP  t  UNTIL  NSY  00  FOR  J  ♦  0  STEP  1  UNTIL  NSY  DIV  4R 
DO  APtI#J3  «■  A  S  C I  #  J  3  ♦  0! 

FOR  P  «■  l  STEP  1  UNTIL  NSY  DO 
FOR  I  ♦  1  STEP  1  UNTIL  NPR  DO 
BEGIN  COMMENT  PREDECESSORS  FIRST! 

FOR  J  «■  2  STEP  1  UNTIL  MAXLPR  DO 
IF  PR  C l *  J  3  *  0  THEN  ELSE 

IF  GEThIT(INHEADCPRII*J]#P  DIV  483,EnTIER(P  MOD  48)3 
THEN 

SETBIT(AP[P#PR£ \» J- 1 3  DIV  483>EnTIER(PRI I#J-13  MOO 
48))! 
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***** 


-  -?•* 


...  Om 


FDR  J  +  1  STEP  1  UNTIL  MaXLPR-1  DO 
IF  PR[!#J*1J  «*  0  THEN  ELSE 

IF  GETRITC INTarTPRC  J#J)#P  DIV  48 ) #  ENT  I  ER( P  MOD  48)> 
THEN 

SETBlTCASCP#PRCl»J*lI  DIV  48 ] # ENT  I ER ( PP 1 1 # J* U  MOD 
48)); 

end  i  p; 

prinTmaTrix<<//*allowed  prfoecessorsi*'/>#  AP)| 

PRINTMATRIX(<//m  ALLOWED  SUCCESSORS  I  •*/>#  AS); 

END  BLOCK  C3) 


begin  commfnt  block  c  4;  comment  hierarchy  analysis; 

INTEGER  A#  B#  C»  P»  X#  Y #  7#  I#  II#  12#  13; 

proceourf  T2s(p>;  value  p;  integer  p; 

BEGIN  COMMENT  the  CANONICAL  PARSE  tail  2  SYMBOLS  OF  p; 
integer  i#  j#  r#  x#  y; 

LABEL  F; 

BEENTHEREtP)  «■  TRUE; 

FOR  I  *  1  STFP  1  UNTIL  NPR  00  IF  PR C I  # 0  J  ■  P  THEN 
BEGIN  FOR  J  «•  MAXLPR  STEP  -1  UNTIL  l  DO  IF  PRC  I#  J)  $  0  THEN 
GO  TO  f; 

FI  R  «•  PRri#J]l 
IF  J  t  1  THEM 

BEGIN  COMMENT  PRODUCTION  LENGTH  AT  LEAST  TWnj 
X  *  PRf  I  #  j-lj;' 

PUT C  X#  R )  I 

FOR  Y  *  1  STEP  l  UNTIL  NSYN  DO 
IF  GETBI T ( SCD t Y# R  DlV  48)#ENTIER(R  MOD  48))  THEN 
PUT  C  X#  Y); 

end; 

IF  NOT  BEENTHERErR)  THEN  T2S(R>; 

END  II 
END  T2S; 

NVAL  l;  FINDS  *■  0;  CRC1)  «■  1024*3; 

FOR  II  *  1  STEP  1  UNTIL  NPR  DO  IF  PR C 1 1  #  2)  t  0  THEN 
BEGIN  COMMENT  HIERARCHY  ANALYSIS  RELATIONS; 

b  «•  prc 1 1 »  n;  c  *  prc 1 1 #  2i; 
p  *  prc i i #  oi; 

FOR  I  *  1  STEP  1  UNTIL  NSY  DO  BEENTHERECI]  ♦  FALSE; 

p?c  *  o ;  T2SCR); 

FOR  Z  *  NSYN  ♦  1  STEP  1  UNTIL  N$Y  00 
IF  GETBlT(lNHEAOrC»Z  DTV  40l#ENTIERCZ  MOD  48))  THEN 
BEGIN  COMMENT  7.  ARE  IN  HSTCC); 

FOR  X  *  1  STEP  1  UNTIL  NSY  DO 

IF  GETBITC APCP#X  DIV  48)#£NTIER(X  MOD  48))  THFN 
BEGIN  ENTERCX#  B#  Z#  FALSE); 

IF  B  S  NSYN  THEN 

FOR  Y  *  1  STEP  1  UNTIL  NSY  DO 

IF  GETBI T( SCDCR#Y  DIV  48)#ENTIERCY  MOD  48))  THEN 
FNTERCX#  Y#  Z#  TRUE); 
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END  x; 

FOR  13  ♦  l  STEP  1  UMTIl  P2C  DO 
FNTE'R<Slt  133*  S?fl3l»  Z»  TRUE  )  ) 

eno  zi 

F OR  12*3  STEP  1  UNTIL  MAXLPR  DO  IF  pRCIl*  121  t  0  THEN 
BEGIN 

A  PRCIl#  12-231  R  *  PRCIl#  12-131  C  ♦  PR C 1 1  #  1231 
FOR  I  *  1  STEP  l  UNTIL  NSy  DO  BFENTHERECI]  ♦  FALSE) 
P2C  *  01  T2S(B  )  1 

FOH  Z  «•  NSYNM  STEP  1  UNTIL  NSY  00 
IF  GETBlT<lNHEAOtC#Z  OIV  48  3  ,  ENT  I  ERC  Z  MOO  48))  THEN 
REGIN  ENTERCA#  R#  Z»  FALSE  >  1 
IF  8  $  ns vn  Then 
FOR  Y  *•  1  STEP  1  UNTIL  NSY  00 

IF  GETRlT<SCDfB»Y  DIV  483#ENTIFR<Y  MOD  48))  THEN 
ENTFR<A,  Y $  7»  TRUE)1 
FOR  13  «■  1  STEP  t  UNTIL  P2C  00 
ENTERCSltm,  S2[I3),  Z#  TRUE)) 

END  Z1 
ENO  I?) 

ENO  ID 

NVAL  ♦  NVAL  -  D 

WRITE(<HHIFRARCHY  ANALYSIS  FUNCT IONS  I "/> ) ) 

TABULATE("P1M)) 

ENO  BLOCK  C  4) 


BEGIN  COMMFNT  Block  C  51  comment  PRODUCTION  RECOGNITION) 
INTEGER  A*  P#  C*  R#  P#  Y#  Z»  I t  ID  12#  13) 

LABEL  LASTONC) 

PROCFOURE  H2SCP11  VALUE  P)  INTEGER  P) 

BEGIN  COMMENT  THE  CANONICAL  PARSE  HEAO  2  SYMBOLS  IN  P) 
INTEGER  D  R#  X#  Y#  Z 1 
BEENTHEKFCP)  *  T RUE  1 

FOR  I  *  1  STEP  1  UNTIL  NPR  00  IF  PRC!#03  s  p  THEN 
BEGIN 

if  PRID23  x  o  Then 

BEGIN  COMMFNT  PRODUCTION  OF  LENGTH  AT  LEAST  TWO) 

X  PRC  D  111  R  *  PRID  2]  1 

IF  GETRITC TNHFADCR*Y  OIV  48)#ENTIER(V  MOO  48))  THEN 
PUT (X#  Y  )  1 

f  NO) 

R  ♦  PRTD  ID 

IF  NOT  BEFNTHERE  C  R  3  THEN  H2SCR)1 
EMO  II 
FNO  H2S1 

NVAL  *  1 1  FINOS  *  0)  CRTll  *  1024*31 

FOR  II  *  1  STEP  1  UNTIL  NPR  00  IF  PRCID  2)  t  0  THEN 

BEGIN  COMMFNT  PRODUCTION  RECOGNITION  RELATIONS! 

FOR  I?  *  2  STFP  l  UNTIL  MAXLPR  DO 


5b 


BEGIN  A  «•  PRU1#  1 3-MJ  B  «■  PRC  II#  123) 

IF  12  ■  MAXLPB  THEN  GO  TO  LASTONEI 
C  *  PRUl#  12*13* 

IF  C  *  0  THEM  GO  TO  LASTONEI 
ENTERC  A#  fi#  C#  FALSE)) 

IF  B  5  NSYN  THEN 

FOR  Y  «•  1  STEP  1  UNTIL  NSY  DO 

IF  GETBIT CSCDCR»Y  DIV  48)#ENTIERCY  MOO  40))  THEN 
FOR  Z  *•  NSYNM  STEP  1  UNTIL  NSY  DO 

IF  GETBITCINHFAOtC#Z  OIV  fl8)#ENTIERCZ  MOD  48))  THEN 
ENTERCA#  Y»  Z#  TRUE ) I 

FOR  I  1  STEP  1  UNTIL  NSY  DO  BEENTHEREt  1 3  «•  FALSE) 

P2C  «•  0)  H2S  (  B  )  3 
FOR  13  *  1  STEP  1  UNTIL  P2C  00 
ENTER ( A#  S1U33#  S2CI33#  TRUE)) 

ENO  I?) 

LASTONEI 
P  ♦  PRCIl#  03) 

FOR  R  «■  1  STFP  l  UNTIL  NSY  DO 
if  getbitc ascp#r  niv  4«)#entiercr  moo  A8>>  then 
FOR  z  ♦  NSYN* 1  STFP  1  UNTIL  NSY  DO 
IF  GETBITCINHEADCR#Z  DIV  4B)#ENTIERCZ  MOO  48))  THEN 
BEGIN  ENTERCA#  R#  7#  FALSE)) 

IF  B  S  NSYN  Twpn 

FOR  Y  «■  1  STEP  1  UNTIL  NSY  DO 

IF  GETRITCSC0CB#Y  DIV  4B)#ENTIER(Y  MOD  48))  THEN 
ENTERCA#  Y#  Z»  TRUE)) 

END  Z  R) 

FOR  I  «■  1  STEP  1  UNTIL  NSY  DO  BEENTHFRECI)  *  FALSE) 

P2C  *  0)  H?S(B)3 

FOR  13  *  1  STEP  1  UNTIL  P2C  DO 
ENTERCA#  S1CI33#  S2CI33#  TRUE)) 

END  ID 

NVAL  «■  NVAL  -  II 

WRITEC<hPROOUCTION  RECOGNITION  FUNC TI ONS I "/> )  I 
TABULATEC"P2")) 

END  BLOCK  C  5) 

END  BLOCK  C) 

END) 

SAVC (T-TI )/3600)l  SA V(P?CSAVF ) I 

WRITECPRINFIL#  <9e8 •?#  "MCKEEMAN">#  FOR  I  *  1  STEP  1  UNTIL  SI  00 
RECORDC II)) 

END. 
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r*  . 


(2#  1  >(  !»2>  SYNTAX  PROCESSOR.  MCNF.EMAN.  JAN.  1966 


PRODUCTIONS  I 


1  <PHQ»iHAM> 

2  <tXPN> 

3 


EOF  <EXHH>  EOF 

< IF  CLAUSE>  «TRUEPART>  <EXPR> 
<SUM> 


A 

5 


« TrtUEPAN  f  > 

<  IK  CLMUSi.> 


<EXPR>  ELSE 

IF  <EXPR>  THEN 


6  <SUM> 

7 
a 
9 

10 

11  <FK1MAHY> 

12 
13 


<SliH> 

♦ 

<PRIMAHY> 

<SUM> 

m 

<PRIMARY> 

♦ 

<PR1MAHY> 

m 

<PRIMARY> 

<primary> 

IDEM 

INTEGER 

( 

<EXPH> 

) 

INTERMEDIATE  SYMBOLS! 

1  <PHOGUAM>  2  <EXpR> 

A  <lf  CLAUSt>  5  <SliM> 


TERMINAL  SYMBOLS! 
7  LOF 
10 

13  ( 

16  ) 


8  IF 
11  IDENT 
1A  ELSE 


the  unioul  target  symbol  isi  <pnugram> 

TIME  ■  0.12*  TOTAL  ELAPSED  «  0,12  MIN. 


3  <TRUEPART> 
6  <PRIMARY> 


9  ♦ 

12  INTEGER 
IS  THEN 


60 


INTAILl 

symbol 

1234367890123456 

1 

<PRQGHAM> 

t  • 

2 

<EXPS> 

•  tt'  •  •  • 

3 

<iruepapt> 

•  • 

4 

<1F  CLAUSC> 

t  • 

5 

<SUH> 

t#  tt  • 

6 

<primart> 

•  •#  • 

7 

EOF 

• 

6 

IF 

• 

9 

♦ 

• 

10 

• 

• 

11 

i  den  r 

• 

12 

INTEGER 

• 

13 

( 

• 

14 

ELSE 

• 

19 

THEN 

• 

16 

) 

• 

me 

■  0*04*  TOTAL 

ELAPSED  a  0. 

6i 


SINGLE  CHARACTER  Ot.KlVAimSt 


SYMBOL  12  J*W09at23*56 
<PRU<iNAM> 

<KXPN>  ..  •• 

<TRUEPAHT> 

<IF  CLAUSE> 

<SUM>  • 

<PRIM«RY>  •  » 

EOF 

IF 

♦ 


T 


IOEN1 

INTEGER 

( 

ELSE 

then 

> 

E  •  0.04#  TOTAL  ELAPSED  " 


0.24  MIN. 


ALIGNED  PHtUECESSOHSt 


SYMBOL  12JAS67890123A56 


1 

«prggham> 

<fcXPH> 

•  • 

*  • 

3 

<TRUEPART> 

t 

A 

< IF  CLAUSE* 

•  • 

•  * 

S 

<SUM> 

t  f 

• « 

6 

<PRIMARY> 

•  • 

•  •  •  • 

1 

EOF 

t 

t 

IF 

•  • 

• » 

9 

♦ 

•  •  • 

* « 

1C 

• 

•  •  • 

•  • 

II 

IDEM 

•  • 

•  •  •  * 

12 

INTEGER 

•  • 

•  ♦  •  • 

13 

( 

•  t 

•  •  •  • 

1A 

ELSE 

t 

15 

THEN 

• 

18 

) 

• 

IKE 

■  0 . 0 A /  TOTAL 

ELAPSED  « 

0*28  MIN 


ALLONED  SUCCESSOHS* 


SYMBOL 

123AS6/890123A56 

1 

<pruqkam> 

2 

<tXP.O 

• 

•  t  • 

3 

<TRU£PART> 

• 

A 

<IF  CLAUSE* 

• 

5 

<SUM> 

•  ♦# 

•  •  • 

8 

<PR I  MARY* 

•  •  • 

t  •  • 

1 

EOF 

f 

6 

IF 

• 

9 

♦ 

• 

m 

• 

11 

IDEM 

•  t  • 

t  •  • 

12 

INTEGER 

•  •  • 

ft# 

13 

( 

• 

1A 

ELSE 

• 

IS 

THEN 

f 

18 

) 

•  •  • 

•  •  t 

TIKE 

>  O.OA#  TOTAL 

ELAPSED  * 

0.3?  MIN 

HIERARCHY  ANALYSIS  FUNCTIONS! 


Pl"( 

<EXPR> 

FLSE 

IE 

>  «  Pl"( 

2. 

1U 

8) 

a 

TRUE 

Pl*< 

<EXPR> 

flse 

♦ 

)  «  PtM< 

2* 

14. 

9) 

a 

TRUE 

Pl"< 

<EXPR> 

ELSE 

• 

)  ■  Pl"( 

2, 

14, 

103 

a 

TRUE 

Pl"< 

<fxpr> 

ELSE 

IOENT 

)  «  Pl"( 

2* 

1 4  * 

11) 

m 

TRUE 

PI**  { 

<EXPR> 

else 

INTEGER 

)  *  ri"( 

2» 

14* 

12) 

a 

TRUE 

Pi"( 

<EXPR> 

ELSE 

< 

)  «  P1"C 

2* 

14, 

13) 

s 

TRUE 

Pl"< 

<EXPR> 

THEN 

IE 

)  ■  PlHC 

2. 

15* 

8) 

s 

TRUE 

PI  •*  c 

<EXPR> 

THEN 

♦ 

>  •  P1"C 

?. 

15* 

9) 

a 

TRUE 

P1H( 

<EXPR> 

THEN 

• 

>  *  Pl'*( 

2* 

15* 

10) 

s 

TRUE 

Pl"( 

<EXPR> 

THEN 

IOENT 

)  »  Pl"< 

2* 

15* 

11) 

a 

TRUE 

Pl"< 

<EXPR> 

THEN 

INTEGER 

>  a  P 1  **< 

2* 

15* 

1?) 

a 

TRUE 

Pl"< 

<EXPR> 

THEN 

( 

>  ■  P 1  **C 

2* 

15* 

13) 

a 

TRUE 

Pl"< 

<EXPR> 

) 

EOF 

)  ■  PI  •*( 

2* 

1  A* 

7) 

a 

TRUE 

Pl"< 

<EXPR> 

) 

)  a  P1M< 

9  , 

16* 

9) 

3 

TRUE 

Pl"< 

<EXPR> 

) 

m 

)  ■  P I  **C 

?.$ 

16* 

10) 

3 

TRUE 

P1M< 

<EXPR> 

) 

else 

>  a  Pl**{ 

’» 

16* 

14) 

3 

TRUE 

Pl"< 

<EXPR> 

) 

THEN 

)  a  Pl"( 

2, 

16* 

15) 

a 

TRUE 

P1M< 

<EXPR> 

) 

) 

)  a  Pl«( 

2* 

16* 

16) 

s 

TRUE 

Pl"C 

<TRUrPART> 

<EXPR> 

EOF 

)  a  P|h( 

3* 

2. 

7) 

a 

TRUF 

P 1  **  c 

<TRUEPART> 

<EXPR> 

ELSE 

)  *  Pl"( 

3# 

2* 

14) 

s 

TRUE 

Pl"< 

<TRUFPART> 

<EXPR> 

then 

)  «  Pl"( 

3* 

2* 

15) 

8 

TRUE 

Pl"( 

<truepart> 

<EXPR> 

) 

>  *  Pi"< 

3# 

?» 

16) 

s 

TRUE 

pl-< 

<TRUEPART> 

<IF  CLAIJSE> 

IE 

>  a  Pl»( 

3* 

4* 

8) 

a 

false 

Pl"( 

<trufpart> 

< IE  CLAUSE> 

♦ 

)  *  Pi"< 

3# 

4* 

9) 

a 

false 

Pl"< 

<trufpart> 

< I E  CLAIJSE> 

m 

)  a  -»1"( 

3# 

4* 

10) 

3 

FALSE 

P1M( 

<trufpart> 

< I E  CLAUSE> 

IOENT 

>  *  l*I"( 

3  > 

4* 

ID 

a 

FALSE 

Pl"< 

<TrUEPART> 

< I E  clause> 

INTEGER 

)  a  Pl"{ 

3, 

4, 

12) 

s 

false 

Pl"( 

<TRUEPART> 

< I E  CL AUSE> 

( 

>  a  Pj«( 

3, 

4* 

13) 

3 

FALSE 

pi- c 

<TRUFPART> 

<SUm> 

♦ 

>  ■  P1**C 

3# 

5. 

9) 

a 

FALSE 

Pl"( 

<trufpart> 

<SJ‘0 

m 

>  a  P1H( 

3# 

5. 

10) 

3 

FALSE 

P1M( 

<TRUEPART> 

<PR  JMARY> 

♦ 

>  «  Pl"C 

3# 

6* 

9) 

3 

TRUE 

p  1"  ( 

<trufpart> 

<PH |HARY> 

m 

)  •  Pt-C 

3  * 

6  * 

10) 

= 

TRUE 

Pl"< 

<trufpart> 

IF 

IE 

)  a  PI  *»C 

3* 

8* 

B) 

S 

FALSE 

PI  **< 

<TRUt'PART> 

IE 

♦ 

)  a  Pl"< 

3# 

8* 

9) 

3 

FALSE 

Pl"< 

<TRUFPART> 

IE 

- 

>  ■  Pl**C 

3# 

8* 

10) 

8 

FALSE 

p  1"< 

<trufpart> 

IE 

IOENT 

)  ■  Pl"( 

3* 

8* 

1  1  ) 

3 

FALSE 

Pt  -  < 

<Trufpart> 

IE 

INTEGER 

>  «  Pl**( 

3# 

8* 

12) 

8 

FALSE 

P 1  •*  c 

<TPUFPART> 

IE 

C 

>  «  Pl"( 

3 , 

8* 

13) 

a 

FALSE 

Pl"( 

<TRurPART> 

♦ 

IOENT 

)  ■  PI  **C 

3# 

9* 

11) 

3 

FALSE 

pr*( 

<TRUtPART> 

♦ 

INTEGER 

)  ■  Pl"( 

3» 

9* 

IP) 

8 

FALSE 

pi*< 

<TRUFPART> 

♦ 

< 

)  a  Pt**( 

3* 

9* 

13) 

3 

FALSE 

pi"< 

<TRUFPART> 

• 

IOENT 

)  ■  P 1  **C 

3# 

10* 

ID 

S 

FALSE 

pim( 

<trufpaRt> 

- 

INTEGER 

)  a  Pl**( 

3# 

10. 

12) 

a 

FALSE 

pi"< 

<  T«UFPART> 

m 

( 

)  ■  Pl"< 

3# 

10, 

13) 

s 

FALSE 

P1"C 

<TRUFPART> 

lOENT 

♦ 

>  «  Ptw( 

3# 

It* 

9) 

r 

TRUF 

pi"( 

<truepart> 

IOEnT 

« 

)  «  P!"< 

3# 

11* 

10) 

s 

TRUE 

Pl"C 

<TRUFPART> 

INTEGER 

♦ 

>  a  Pl«( 

3# 

12* 

9) 

a 

TRUF 

pih( 

<TRUFPaRT> 

lNTrc,ER 

- 

)  •  Pi-C 

3* 

12* 

10) 

a 

TRUF 

pim( 

<TRUFPART> 

( 

IE 

)  a  Pl«( 

3i 

13, 

8) 

8 

FALSE 

pr*( 

<TRUEPART> 

( 

♦ 

>  ■  P|"( 

3, 

13* 

9) 

S 

FALSE 

piH< 

<T»UEPART> 

( 

m 

)  ■  PI  **C 

3* 

13* 

10) 

8 

FALSE 

piM( 

<TRUEPART> 

( 

IOENT 

)  *  Pl"< 

3# 

13* 

11) 

3 

FALSE 

piM( 

<TRUFPART> 

c 

INTEGER 

)  a  Pl**( 

3, 

13* 

1?) 

a 

FALSE 

6k 


P1M< 

<TRUEPART> 

C 

( 

)  ■  Pi"< 

3# 

13# 

13) 

B 

FALSE 

Pl**( 

<if 

CLAUSE  > 

<EXPR> 

ELSE 

>  ■  pih< 

4# 

2* 

14) 

8 

FALSE 

Pl"( 

<ir 

CLAUSE> 

<TRUEPaRT> 

IF 

>  -  Pi"( 

ti 

3# 

8) 

8 

FALSE 

Pi"< 

<  IF 

CL AUSE> 

<TRUEPART> 

♦ 

>  »  Pl"< 

4# 

3  # 

9) 

8 

FALSE 

Plw< 

<IF 

CLAUSE> 

<TRUEPART> 

• 

)  ■  pi«< 

4# 

3* 

10) 

8 

FALSE 

P1M< 

<IF 

CLAUSE> 

<truepart> 

IDENT 

)  ■  PI"( 

4 # 

3# 

11) 

8 

FALSE 

Pl"( 

<IF 

clausf> 

<TRUEPART> 

INTEGER 

>  »  pi*( 

4  » 

3# 

12) 

8 

false 

Pl"< 

<IF 

CLAUSE> 

<truepart> 

( 

)  ■  Pl"( 

4# 

3* 

13) 

8 

false 

Pl"< 

<IF 

clause> 

< IF  CLAUSO 

IF 

>  »  pi**( 

4# 

4  # 

8) 

8 

FALSE 

Pl"< 

<IF 

CL ause> 

<IF  CLAUSE> 

♦ 

)  *  pi*( 

4  • 

4# 

9) 

8 

FALSE 

pl"< 

<IF 

CLAUSE> 

< IF  CLAUSE> 

m 

>  «  pim( 

4* 

4  # 

10) 

8 
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SECTION  3 


THE  KERNEL  LANGUAGE 

Principles  of  Design 

The  kernel  language  must  above  all  provide  the  programmer  with  a 
convenient  means  for  controlling  an  automatic  digital  computer.  Our 
first  task  is  to  discuss  several  general  principles  of  language  design 
and  the  contribution  each  makes  toward  the  final  form  of  the  kernel 
language . 

We  require  that  the  language  be  minimal  in  that  the  forms  of  the 
language  must  be  concise  and  that  there  be  as  few  kinds  of  forms  as 
necessary.  The  conciseness  and  mnemonic  significance  of  expressions  in 
program  text  will  depend  upon  the  available  character  set  as  well  as  the 
aesthetic  suitability  of  the  multicharacter  symbols  chosen  to  represent 
the  various  linguistic  entities.  We  have  exercised  considerable  care  in 
choosing  the  forms  for  the  kernel  language,  drawing  from  the  notations 
of  Algol,  Euler  [25],  Iverson’s  language  [15]  and  PL/l[l4]„  We  neverthe¬ 
less  realize  that  our  readers  with  different  experience  in  language  or 
with  different  hardware  may  take  strong  exception  to  our  choices.  Our 
interest  is  primarily  in  the  organization  behind  the  linguistic  fajade 
and  we  take  refuge  in  the  realization  that  the  language  user  can  choose 
his  own  forms  with  the  aid  of  the  mechanisms  of  the  extendable  compiler. 
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We  minimize  the  number  of  different  structural  forms  by  requiring 


that  the  kernel  language  be  involuted.  Involution  is  achieved  by  avoiding 
constructs  that  are  applicable  only  in  local  context;  we  give  some  examples 
of  failures  in  existing  computer  languages. 

In  Algol  60  we  find  the  following  isolated  features: 

(1)  A  primitive  list  structure  in  the  constructs  <for  list> 

and  <actual  parameter  list>  which  is  unavailable  elsewhere  (for  instance, 
to  be  used  in  array  initialization). 

(2)  General  call  by  name  is  available  only  through  actual  parameters. 

(3)  Dynamic  memory  allocation  is  available  only  at  block  ent „ 

We  also  find  that  most  compilers  provide  a  separate  language  for  inp.t 
and  output  which  includes  only  a  fraction  of  the  power  of  the  complete 
language.  In  each  case  the  power  of  the  language  can  be  increased,  the 
number  of  primitive  concepts  reduced  and  the  compiler  simplified  by 
bringing  the  action  out  into  the  main  program  on  a  level  with  other 
statements . 

By  choosing  operators  and  data  types  to  reflect  closely  the  mental 
processes  of  the  language  user  we  can  substantially  add  to  his  abi  lity  to 
write  brief  and  lucid  programs.  With  distressing  frequency  we  find  that 
existing  computers  are  ill*suited  to  the  tasks  thus  set.  We  will  find  that 
our  goal  of  designing  a  mutable  computer  language  frequently  implies  a 
more  anthropoid  machine. 

A  program  can  be  viewed  as  a  sequence  of  operations  on  a  data 
structure  It  is  necessary  to  provide  the  programmer  with  forms  designed 
to  control  the  sequence  conveniently.  We  find  that  with  a  sufficiently 
elaborate  set  of  sequence  controlling  forms,  we  have  no  need  for  the 
more  traditional  labels  and  go-to  statements.  Lest  we  be  misunderstood, 
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the  inclusion  of  labels  would  not  appreciably  complicate  the  translator. 

We  would  regard  the  appearance  of  the  label  definition  as  an  instruction 
to  initialize  the  corresponding  local  variable  to  an  appropriate  value 
of  type  label  upon  entry  into  the  scope  of  the  variable.  The  mechanics 
of  implementing  the  go-to  statement  are  given  in  Wirth  and  Weber  ([25]  p  52). 
We  feel  that  labels  are  an  anochronistic  holdover  from  early  computer 
languages  and  are  not  in  the  collection  of  basic  concepts. 

Whenever  possible  we  defer  actions  to  a  later  time.  A  deferred 
action  implies  an  increased  freedom  since  we  have  preserved  our  ability 
to  choose  what  action,  if  any,  to  take. 

In  particular,  we  shall  require  that  each  value  be  marked  with  its 
type  during  execution.  In  this  way  we  can  make  the  machine  operators 
dynamically  data  dependent. ^ 

The  extendable  compiler  is  a  translator  from  the  kernel  language 
into  a  machine  language.  That  language  will  generally  be  a  mixture  of 
direct  commands  to  the  hardware  and  interpretable  information  to  direct 
the  hardware  and  other  programs  present  at  execution  time.  We  will  call 
the  program  structure  present  at  execution  time  the  interpretive  system 
(or  simply,  the  system)  to  distinguish  it  from  the  hardware. 


t  Consider,  for  instance,  the  effect  of  the  arithmetic  operators  in 
Iverson’s  language  ([l5],  p  13).  Dynamic  typing  demands  a  memory 
organization  substantially  different  than  any  known  to  the  author.  It 
can  be  avoided  by  adding  typing  information  to  the  declaration  structure 
of  the  kernel  language. 
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The  program,  the  system  and  the  hardware  form  three  levels  of 


control  over  the  action  of  the  machines.  It  is  possible  to  have  even 
more  levels  of  control  than  those  described  here.  For  instance,  the 
microprogramming  feature  of  the  IBM  System  560  line  of  machines  [19] 
could  be  inserted  between  the  interpreter  and  the  hardware.^  We  may 
change  the  system  at  any  level.  As  we  progress  down  the  levels,  the 
changes  become  more  difficult  (more  expensive)  and  the  results  are  more 
general . 


Example  Programs  in  the  Kernel  Language 

We  have  implemented  an  extendable  compiler  for  the  kernel  language 
on  the  Burroughs  B5500.  The  actual  compiler  and  its  description  are  given 
elsewhere  [20]  but  we  wish  to  present  the  results  of  the  execution  of 
selected  kernel  language  programs  as  motivation  for  the  sequel.  (Note 
another  extensive  example  on  page  55.) 

We  give  several  trivial  examples  which  are  essentially  self- 
explanatory  and  finish  with  a  version  of  the  extendable  compiler  written 
in  the  kernel  language.  The  programs  and  output  are  given  in  typescript 
instead  of  actual  computer  listing  because  the  B5500  character  set  is  not 
sufficiently  rich  to  produce  readable  listing.  We  present  the  B5500  listin 
of  the  first  example  for  the  purpose  of  reader  comparison. 
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This  has  in  fact  been  done  by  H.  Weber  for  Euler  IV  on  IBM  560  model  50. 


Example  1.  The  procedure  assigned  to  the  identifier  "factorial"  gives 
the  usual  recursive  definition  of  the  factorial  function.  The  local 
variable  "n"  is  initialized  from  the  parameter  list  when  the  procedure 
is  activated.  Note  the  subscript  "[l]".  If  it  were  omitted  the  procedure 
would  return  a  value  of  type  list  with  one  member  equal  to  the  required 
factorial.  The  subscript,  here  analogous  to  the  assignment  to  a  procedure 
identifier  in  Algol  60,  serves  to  select  the  desired  value. 

Note  also  that  the  identifier  "k"  does  not  appear  in  the  declara¬ 
tion  of  its  list.  It  is  local  to  the  scope  of  the  iterative  statement 
and  is  declared  by  its  appearance  as  the  control  variable. 

{new  factorial, 
factorial  *-  © 

(new  n,  if  n  =  0  then  1  else  n  X  factorial{n-l)  )[l], 
for  all  k  from  1  to  6  do 

out  «-  (k  base  10)  ©  "  factorial  ■"  © 

( factorial  (k)  base  10 )  ©  cr 

)  eof 

***  output  *** 

1  factorial  =  1 

2  factorial  =  2 
5  factorial  =  6 

4  factorial  =  2k 

5  factorial  =  120 

6  factorial  =720 

B5500  Version  of  Example  1. 
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BLG 1 N  *  TEST  PROGRAM  FOR  RECURSIVE  FACTORIAL  * 
NCW  FACTORIAL  N# 

FACTORIAL  «■  # 

BEGIN  NEW  N> 

IF  N  s  0  THEN  1  ELSE  N*FACT0RIAL  BEGIN  N-l  ENO 
END! 1 }» 

FOR  ALL  N  FROM  1  TO  A  DO 

OUT  ♦  (N  BASE  10)  CAT  "  FACTORIAL  »  "  CAT 
(FACTORIAL  BEGIN  N  ENO  RASE  10)  CAT  CR 
ENO  EOF 


*•**  output 

1  FACTORIAL  ■  1 

2  FACTORIAL  »  2 

3  FACTORIAL  «  6 

4  FACTORIAL  *  24 

5  FACTORIAL  »  120 

6  FACTORIAL  *  720 


1110  INSTRUCTIONS  EXECUTED 


— DR-- 

CAT 

UNION 

INTER 

DIFF 

BASE 

OR 

AND 

57 

16 

0 

0 

0 

12 

0 

0 

< 

i 

s 

t 

2 

> 

MEM 

INCL 

0 

0 

27 

0 

0 

0 

0 

0 

CONTAI 

EOV 

MAX 

MlN 

♦ 

. 

X 

U 

0 

0 

0 

0 

21 

21 

0 

MOD 

MV 

* 

NOTMEM 

index 

LIST 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

/ 

PO 

STO 

SET 

BRB 

FETCH 

XCG 

0 

0 

1 

7 

1 

6 

102 

l 

NAME 

JOF 

BRF 

VAL 

BEGIN 

END 

XEQ 

C  ASX  I T 

121 

27 

6 

249 

56 

56 

0 

0 

SUBS 

CALL 

AP 

RTN 

EOS 

FOR 

FORXIT 

EOP 

27 

27 

175 

27 

56 

7 

1 

1 

NOT 

0 

0 

0 

0 

0 

0 

0 

0 

MINUS 

ABS 

TYPE 

ROUND 

LENGTH 

CHOP 

0 

0 

0 

0 

0 

0 

0 

0 
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H  CM  in  VO 


Example  2.  Inner  product. 


{  out  «-((+/  {1,2,3}  X  {3,2,1})  base  10)  ©  cr}  eof 
***  output  *#* 

10 


Example  3.  A  simple  sort  procedure. 

{  new  sort, 
sort  <-  (?) 

{  new  x, 

for  all  i  from  1  to  length  x  do 

for  all  j  from  i+1  to  length  x  do 

(if  x[i]  >  xtj]  then  x[{i,j}]  <-x[{j,i}]), 
x 

)[2], 

for  all  i  from  sort( (6, 5, ^,3,2,1} }  do 
out  f-  (i  base  10 )  ©  cr 

)  eof 

***  output  *** 
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Example  4.  A  procedure  to  generate  all  the  permutations  of  an 


ordered  set. 


(  new  perm, 
perm  «- 
(  new  x, 

if  0  =  length  x  then  (x)  else 
©/(for  all  i  from  1  to  length  x  do 

for  all  t  from  perm{x[l  to  i-l]  ©  x[i+l  to  length  x]} 
do  x[i  to  i]  ©  t) 


Hi], 

for  all  test  from  "a”,  "ab",  "abc",  "abed"}  do 
for  all  p  from  perm(test)  do  out  «-  p  cat  cr 


}  eof 

***  output  *** 

a 
ab 
ba 
abc 
acb 
bac 
bca 
cab 
eba 
abed 
abdc 
acbd 
aedb 
•  •  • 
etc. 
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Example  5.  The  following  program  is  a  compiler-executor  for  a  small 
language.  The  organization  of  the  program  is  essentially  that  of  the 
extendable  compiler  written  for  the  Burroughs  B5500.  We  will  make 
comments  on  the  kernel  language  constructs  used,  the  organization  of 
the  compiler-executor  and  the  implementation  of  the  small  language. 

We  find  the  following  major  sections:  (l)  The  syntactic  analysis 
tables,  (2)  The  scanner,  (3)  The  compile  actions  definition  and  the 
compiler,  (4)  The  execute  actions  definition  and  the  executor,  (5)  The 
test  program  and  its  output. 

In  the  outermost  list  we  find  the  declaration  of  all  the  global 
identifiers.  To  seven  of  them  we  find  immediate  assignments  of  syntactic 
analysis  tables.  The  tables  are  best  understood  in  reference  to  the 
Backus-Naur  Form  description  of  the  small  language  contained  in  the  com¬ 
ments  in  the  compile  action  definitions.  The  table  "reservedsymbols" 
is  a  list  of  strings  which  correspond  to  the  nonterminal  and  terminal 
symbols  in  the  grammar.  The  position  of  a  symbol  in  the  list  is  called 
its_  symbol  number. 

The  table  "productionrightparts"  is  a  list  of  lists,  each  of  the 
latter  corresponding,  in  order,  to  the  right  part  of  a  production 
(<prograni>  is  symbol  1,  eof  is  symbol  15,  (l,l5J  corresponds  to 
<progran>eof ) .  "productionleftparts"  contains  the  symbol  number  of  the 
left  part  of  the  corresponding  production. 

The  next  four  tables  are  linearized  representations  for  the  parsing 
functions  PI'  and  P2'  which  locate  the  right  and  left  ends  of  the 
next  CRS.  All  seven  tables  could  have  been  produced  by  a  syntax  pre¬ 
processor  similiar  to  the  symbol  pair  algorithm  of  Section  2. 
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The  scanner  must  fetch  the  next  terminal  symbol  from  the  input 
text  each  time  it  is  called.  In  the  case  of  the  small  language  this 
means  identifying  digits  (which  are  less  than  10  in  our  character  set), 
catenating  identifiers  (letters  are  less  than  36),  matching  reserved 
identifiers  with  their  syntactic  symbol  numbers,  entering  variables  into 
the  symbol  table  and  matching  special  characters  with  their  syntactic 
symbol  numbers. 

Now  skip  ahead  to  the  procedure  assigned  to  "compile”.  After  some 
initializing  we  find  "while  compiling  do"  which  controls  a  loop  down  to 
the  end  of  the  procedure.  Within  that  loop  we  immediately  find  the  syn¬ 
tactic  analysis  algorithm.  In  the  first  inner  loop  we  are  scanning  ahead 
to  the  right  under  control  of  the  linearized  form  of  function  PI'.  Having 
located  the  right  end  of  the  CRS,  we  exit  the  loop  and  enter  a  loop  scan¬ 
ning  for  the  left  of  the  CRS  under  control  of  the  linearized  version  of 
function  P2' .  At  the  termination  of  the  second  loop,  we  may  compare  the 
CRS  with  the  production  right  part  table  to  find  the  production  number 
"pn" . 

"pn"  is  used  as  a  subscript  to  select  the  compile  actions  corres¬ 
ponding  to  that  production  from  the  preceding  table.  The  prescript 
operator  "[ compileactions[pn] 3"  causes  the  execution,  in  order,  of  actions 
from  the  explicit  list  following  the  prescript.  For  instance,  the  dis¬ 
covery  of  production  two  would  cause  the  integer  12  to  be  placed  in  the 
code  array,  the  program  pointer  to  be  incremented  and  the  variable 
"compiling"  to  be  set  to  false,  thus  terminating  compilation. 

At  the  termination  of  each  compilation  step  we  find  the  substitution 
of  the  production  left  part  for  the  CRS. 
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The  compiler  is  considerably  simplified  by  having  the  entire  code 
array  and  execution  memory  available  at  compile  time. 

The  translated  code  for  the  small  language  consists  of  a  sequence 
of  twelve  operation  codes.  Within  the  procedure  assigned  to  "excute" 
we  find  another  prescript  "[ executeactions[code[pp]  ]  ]" .  The  operation 
code  in  "codetpp]"  is  used  to  select  a  sequence  of  execution  actions 
from  the  preceding  table.  Execution  proceeds  until  the  operation  code  12 
causes  the  execution  action  "executing  *-  false"  whereupon  execution 
terminates . 
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mm 


'Micro-mutant,  a  small  version  of  the  extendable  compiler' 
(new  code  pp  memory  variables  mp  text  tp  fl  gl  f2  g2 

reservedsymbols  product ionleftparts  product ionrightparts 
compile  compileactions  execute  executeactions 
scan  next  symbol  scanval, 

'Seven  tables  prepared  by  a  syntax  preprocessor' 
reservedsymbols  «-  'syntactic  vocabulary' 

("<progran>",  "<stmt>",  "<stmtl>",  "<if  clause>", 
"<label>",  "<list  hea(5>-",  "<expi>",  "<exprl>", 

"<arith  ejq>",  "<tern>",  ''<terml>",  "<factoi>", 

"<integei>",  "<vai>",  "eof",  "go",  "output",  "if", 
"<ident>",  "begin",  "(",  "<digit>",  "end",  "to", 
"s",  "X",  "/",  "then",  ")"  }, 

product  ionrightparts  <- 

((1,15),  (15,2,15),  (5),  (5,5),  (4,3),  (6,25), 

(16,24,7),  (17,7),  (7),  (18,7,32),  (5,25),  (20,2), 
(6,26,2),  (8),  (14,27,8),  (9),  (9,28,10),  (9,29,10), 
(10),  (11),  (11,30,12),  (11,31,12),  (12),  (14), 

(13),  (21,17,33),  (22),  (13,22),  (19)), 
product  ionleftparts  «-  (1,1, 2, 3, 3, 3, 3, 3, 3,4, 5, 6, 6, 7, 8, 

8, 9, 9, 9, 10, 11, 11, 11, 12, 12, 12,13, 13, 14 ) , 
fl  <_  (1,2, 3, 1,1, 1,3, 4, 4,5, 5, 6,6,6, 3, 1,1, 1,7,1, 1,7, 3, 

1,7, 1,1, 1,1, 1,1, 7, 6), 

gl  (l,  1,1, 1,1, 1,1, 1,1, 1,1, 1,1, 1,2, 3, 3, 3, 3, 3, 3, 6,1, 

1,7, 1,6, 4, 4, 5, 5, 3, 3), 

f2  <-  (1,1,1, 6, 6, 1,1,1, 1,1, 1,1,1, 1,7, 1,5, 5, 1,7, 5,1,1, 

5,1, 7,^,3, 3, 2, 2, 1,1), 

g2  <—  ( 1, 7, 6, 1, 1, 1, 5,4, l, 3, 1, 2, l, l,  1,1, 1, 1, 1, 1, 1, 1, 1, 

1,1, 1,1, 1,1, 1,1, 1,1), 
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scan  <- 


'fetch  the  next  terminal  symbol  from  the  text' 

(new  t,  while  text[tp]  =  "  "[l]  do  tp  «-  tp  +  1, 
if  text[tp]  <  10  then 

{nextsymbol  «-  "<digit>"  index  reservedsymbols, 
scanval  «-  textftp],  tp  «-  tp  +  1 
}  else 

if  textltp]  <  36  then 
('catenate  an  identifier' 

t  «-  tp,  while  textltp]  <  36  do  tp  «-  tp  +  1, 

t  <-  text(t  to  tp-l], 

nextsymbol  «-  t  index  reservedsymbols, 

if  nextsymbol  =  0  then 

('a  variable' 

nextsymbol  «-  "<ident>"  index  reservedsymbols, 
scanval  «-  t  index  variables, 

if  scanval  =  ft  then  variables  [scanval  «-  mp  «-  mp  +  l]  «-  t 

) 

}  else  'must  be  a  special  character' 

(nextsymbol  «-  text[tp  to  tp]  index  reservedsymbols, 
tp  «-  tp  +  1 

} 

},  'end  of  scanning  algorithm' 
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c  amp  ileact  ions  <- 


0, 

*<progran>  ::= 

<progranf>  eof 

(12,17,19), 

'<progrartf>  ::= 

eof  <stmt>  eof 

(2,17), 

'<stmt>  :  := 

<stmtl> 

O, 

'<stmtl>  :  := 

<labe]>  <stmtl> 

15, 

'<stmtl>  :  := 

<if  clause>  <stmt]> 

0, 

'<stmtl>  ::= 

<list  heac>  end 

(1,17), 

'<stmtl>  ::= 

go  to  <expi> 

(7,17), 

'<stmt]>  ::= 

output  <expi> 

0, 

'<stmtl>  :  :=s 

<expi> 

(3,17,18,17), 

'<if  clause>  ::=> 

if  <expi>  then 

15, 

*<label>  ::= 

<ident>  : 

0, 

'<list  head> 

begin  <stmt> 

0, 

'<list  head>  ::= 

<list  heacO  ,  <£tmt> 

0, 

'<expi>  s  := 

<exprl> 

(6,17), 

'<exprl>  :  := 

<var>  <-  <exprl> 

0, 

'<exprl>  ::= 

<arith  exp> 

(8,17), 

'<arith  exp> 

<arith  exp>  +  <tena> 

(9,17), 

'<arith  exp>  :  := 

<arith  exp>  -  <tern£> 

0, 

'<arith  exp>  :  := 

<temi> 

0, 

'<tertr>  :  := 

<terml> 

(10,17), 

'<terml>  :  := 

<terml>  x  <factoi> 

(11,17), 

'<terml>  ::= 

<terml>  /  <factor> 

0, 

'<terml>  :  := 

<factor> 

(5,17), 

'<factor>  j  := 

<var> 

(4,17,14,17), 

'<factor>  ::= 

<integer> 

0, 

'<factor>  ::= 

(  <expr>  ) 

0, 

'<integer>  :;= 

<digit> 

16, 

'<integer> 

<integer>  <digit> 

(4,17,14,17) 

'<var>  ::= 

<ident> 
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compile  4-  © 

{new  x  xv  xp  lp  pn  compiling, 

pp  4-  tp  <-  1,  mp  «-  0,  compiling  <-  true, 

x  <-  xv  <-  25  list  0,  xp  2, 

x[l]  4-"("  index  reservedsymbols, 

x[2]  <- "eof"  index  reservedsymbols, 

memory  *-  variables  <-  10  list  fl  ,  code  100  list  0, 

scan,  'initialize  nextsymbol  and  scanval' 

while  compiling  do 

{while  fl[x[xp]]  <  gl[ nextsymbol]  do 

{x[xp4-xp+l]  <-  nextsymbol,  xv[xp]  <-  scanval, 
scan  'the  decision  for  function  PI'}, 
lp  <-  xp, 

while  f2[x[lp-l]]  <  g2[x(lp]]  do  lp  4- lp-1, 

'the  right  part  of  the  next  CRS  is  between  lp  and  xp ' 
pn  «-x[lp  to  xp]  index  productionrightparts, 

'the  production  number  is  used  as  an  index  to  select 
a  sequence  of  compile  actions' 

[compileactionstpn]]  'a  prescript  on  the  following  list' 
{'the  first  twelve  conqjile  actions  correspond  to 
execution  macro-instruction  operation  codes' 
codetpp]  4—  1,  code[pp]  *-  2,  codelpp]  4-  3> 

codetpp]  4-  codetpp]  4-  5 >  codetpp]  4-  6, 

codetpp]  4-  It  codetpp]  4-  8,  codetpp]  4-  9 > 

codetpp]  4-  10,  codetpp]  4-  11,  codetpp]  4-  12, 

'the  remaining  7  rules  do  fixups,  label  initialization 
increment  the  program  pointer,  etc.' 


memory [  xv[  xp-l]  ]  «-  pp, 

•13' 

code[pp]  «-xv[xp], 

»l4' 

code[xv[xp-l]3  <-  pp, 

•15' 

xv[lp]  <-  (xv[xp-l]  x  10) 

+  xv[xp],  '16' 

PP  «-  PP  +  1* 

•17’ 

xvflp]  <-  pp, 

•18' 

compiling  «-  false 

'19' 

4—  lp,  'making  the  left-for-right  part  subst.' 

x[xp]  product ionleftparts[pn] 


} 

),  'end  of  compilation  procedure' 


1 
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executeactions  «- 


({2,15), 

'unconditional  branch 

1' 

(15,1), 

'clear  stack 

2' 

(1,10,15), 

'branch  on  zero 

3' 

(*t,  1,6,1), 

'load  stack  from  code 

4' 

(8,1), 

'load  stack  from  memory 

5' 

(9,7, 5,1), 

'store  stack  to  memory 

6* 

(3,1), 

'decimal  output 

7' 

(11,5,1), 

•add 

8* 

(12,5,1), 

' subtract 

9' 

(13,5,1), 

'multiply 

10' 

(14,5,1), 

'  divide 

11' 

16 

•halt 

12'} 
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execute  «-  (?) 

(new  executing  stack  sp, 

sp  4-  0,  pp  «-  1,  executing  <-  true, 
stack  <-  100  list  0, 
while  executing  do 
[  executeact  ions  [  code  ( pp  ]  ]  3 


{  pp  <-  pp  +  1,  *  1* 

pp  4-  stack[sp],  *  2* 

out  4-  (stack! sp]  base  10)  ®  cr,  '3' 

sp  4-  sp  +  1,  '  4' 

sp  4-  sp  -  1,  *  5* 

stack!  sp]  4-  code[pp],  »  6* 

stack!  sp-l]  4-  stack!  sp],  *  y 

stack! sp]  4- memory!  stack! sp]],  •  8* 

memory! stack! sp-l]]  4- stack! sp],  •  9' 

pp  4-  if  stack!  sp]  =  0  then  codetpp]  else  pp  +  1,  '10' 

stacklsp-l]  4- stack! sp-l]  +  stacktsp],  'll' 

stack! sp-l]  4-  stack! sp-l]  -  stacktsp],  ’12' 

stack(sp-l]  4- stack! sp-l]  x  stacktsp],  '13' 

stacklsp-l]  4-  stack! sp-l]  t  stacktsp],  'l4' 

sp  4—  0,  '15* 

executing  4-  false  'l6' 

) 

},  'end  of  execution  procedure' 
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'test  program  for  micro-mutant  compiler' 
text  <- 

"begin  n  <-  1, 

k:  if  1024-n  then 
begin  output  n, 
n  «-  2  X  n, 
go  to  k 
end 

end  eof  eof  ", 
compile, 

out  «-  "code  dump:"  ©  cr  ©  "pp"  ©  tab  ©  "inst"  ©  cr, 
for  all  i  from  1  to  pp-1  do 

out  «-  (i  base  10)®  tab  ©  (code[i]  base  10)  ©  cr, 

out  ♦-  cr, 

execute, 

out  <-  cr  ©  "memory  dump:"  ©  cr, 

for  all  i  frcrt  1  to  mp  do  out  «-  variables[i]  ©  © 

(memory!  i]  base  10)  ©  cr 

eof 
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code  dump: 
pp  inst 

1  4 

2  1 

3  4 

4  1 

5  6 

6  2 

7  4 

8  1024 

9  4 

10  1 

11  5 

12  9 

13  3 

14  35 

15  4 

16  1 

17  5 

18  7 

19  2 

20  4 

21  1 

22  4 

23  2 

24  4 

25  1 

26  5 

27  10 

28  6 

29  2 

30  4 

31  2 

32  5 

33  1 

34  2 

35  2 

36  2 

37  12 

1 

2 

4 

8 

16 

32 

64 

128 

256 

512 

memory  dump: 
n  =  1024 
k  =  7 
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Syntactic  and  Semantic  Definition 


The  following  table  is  the  phrase  structure  grammar  for  the  kernel 
language.  We  adopt  the  Backus -Naur  Form  of  the  Algol  report,  substitute 
the  reduction  symbol  for  the  production  arrow  of  Section  2, 

enclose  the  members  of  V.T  in  the  brackets  and  and  underline 

-N 

the  multicharacter  representations  for  members  of  V^.  The  special  sym¬ 
bols  integer,  identifier  and  string  are  discussed  on  page  93. 

We  remind  our  readers  that  the  grammar  obeys  two  restrictions  that 
occasionally  give  it  an  artificial  appearance.  First,  it  is  a  symbol  pair 
grammar.  Second,  the  productions  have  been  carefully  selected  to  reflect 
the  desired  sequence  of  execution  in  the  canonical  parse  to  simplify  the 
production  of  the  machine  code. 


Symbol  Pair  Grammar  for  the  Kernel  Language 


<progranf> 

<expression> 

<expression^> 

<expression2> 

<expression^> 


<procedure> 
<if  clause> 


}-  <expression> 

<expression1> 

<if  clause>  <expression1>  | 
Expressions 

Expression^1 

<if  clause>  <truepart>  Expression^  | 
<primary^>  «-  Express  ionj>  | 
<procedure>  Expression^  | 

<for  clause>  do  Expression^  | 

<for  clause>  <while  clause>  do 

Express  ion^1  | 

<while  claus e>  do  Express  ion^1  | 

<step  list> 

© 

if  <expression>  then 
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<truepart> 

<for  clause> 
<while  clause 
<while> 

<step  list> 


< simple  expr> 
<simple  expr^> 


<primary> 

Cprimary 

<primary2> 


<list> 

<list  head> 


Cbegin> 

<case> 

<case  head> 


<case  begin> 

<declaration> 

<declaration^> 

<constant> 

<infix> 


Cpref ix> 


::=  <expression^>  else 

::=  for  all  identifier  from  <step  list> 

::=  Cwhile>  <step  list> 

:  :=  while 

::=  <simple  expr>  to  <simple  expr>  | 

<simple  expr>  b^  <simple  expr>  | 

<simple  expr>  to  <simple  expr>  by  <simple  expr>  | 
<simple  expr>  bjr  <simple  expr>  to  <simple  expr>  | 
<simple  expr> 

::=  <simple  expr^> 

::=  <primary>  <infix>  <simple  expr^  | 

<prefix>  <simple  expr^>  | 

Cinfix>  /  <simple  expr^>  | 

<primary> 

::=  <primary^> 

::=  <primary]>  [<expression>]  |  <primary2> 

::=  <constant>  |  (<expression>)  | 

identifier  <list>  |  identifier  |  <list>  |  <case> 
<list  head>  } 

::=  <list  head>  ,  <expressiori>  | 

<begin>  <declaration>  |  <begin>  <expression> 

::=  { 

::=  <case  head>  ) 

Cease  head>  ,  <expression>  | 

Cease  begin>  cdeclaratiori>  | 

Cease  begin>  Cexpression> 

::=  [cexpression>]  { 

Cdeclaration^> 
new  identifier  | 

Cdeclaration^  identifier 

::=  true  |  false  |  integer  |  integer  .  |  .  integer  | 
integer  .  integer  |  universe  |  string  |  Cbegiri>  } 

©  I  U  I  n  |  e  |  base  |  V  I  A  |  C  |c|=|/| 

>  I  >  I  £  I  I  index  | list  |  C  |  D  |  =  |  max  |  min  | 

+  I  -  I  X  I mod |  *  |  t 

::=  H  |  minus  |  abs  |  type  | round  |  chop  |  length  |  set 
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We  will  now  give  our  interpretations  of  each  construct.  The 
description  of  an  involuted  language  involves  the  use  of  terms  before 
they  are  defined.  Paragraph  numbers  and  cross  references  are  used  to 
ease  the  reader's  task  in  following  the  description. 

We  must  distinguish  between  text  describing  the  form  of  a  construct, 
text  giving  examples  of  a  construct,  text  describing  the  meaning  of  a 
construct,  text  justifying  the  choice  of  a  construct  and  text  advocating 
a  particular  system  organization  or  machine  design.  We  distinguish  them 
when  possible  with  paragraph  headings  of  Syntax,  Examples ,  Semantic 
Description,  Justification,  and  Implementation,  respectively. 

Implementation  of  Reserved  Words,  Identifiers,  Strings  and  Integers . 
By  an  underlined  word  in  the  grammar  we  mean  to  reserve  that  word  for 
exclusive  use  in  the  given  grammatical  context.  We  do  not  then  need  spe¬ 
cial  character  sets  or  escape  symbols  to  write  programs.  One  implication 
is  that  spaces  are  significant  and  that  we  cannot  know  whether  an  identi- 
fier  is  reserved  or  not  until  we  have  seen  all  of  it.  Thus  we  find  that 
the  process  of  catenating  identifiers  must  take  place  outside  of  (and 
before)  the  syntactical  analysis  algorithm.  We  assign  this  task  to  a 
procedure  called  the  scanner.  It  turns  out  to  be  convenient  to  recognize 
and  convert  both  integers  and  strings  there  also.  As  a  result  we  find  the 
symbols  integer,  identifier  and  string  terminal  in  the  grammar  but  not 
underlined.  The  inclusion  of  natural  language  text  within  a  program  in 
the  form  of  parenthetical  comments  to  the  reader  is  provided  by  choosing 
an  otherwise  unused  character  as  a  comment  bracket.  We  reject  the  Algol  60 
comment  convention  because  it  is  neither  concise  nor  independent  of  the 
program  structure  (since  it  involves  the  use  of  the  semicolon). 
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Semantic  Description  of  Values.  Before  we  can  discuss  (3-1 ) 

constants,  we  must  introduce  the  values  they  represent.  We  specify  in 
the  language  four  unstructured  types  of  values  (undefined,  number,  name 
and  process)  and  three  structured  types  (string,  list,  and  set) . 

Syntax  of  Constants. 

<constant>  fl  |  oo  |  universe  | 

integer  |  integer  .  |  .  integer  |  integer  .  integer  | 
true  |  false  | 

<begir£>  )  |  string 
<begin>  : :=  { 

Examples  of  Constants . 

fl  oo  universe  true  false 
2  2.  .2  136.721 

( )  "ABCDEFGHIJ-32" 

Semantic  Description  of  Constants  of  Typed  Undefined. 

fl  ,  od  ,  and  universe  have  type  undefined.  The  value  of  a  variable 
before  anything  has  been  stored  into  it  is  fl  ;  the  result  of  dividing 
a  positive  number  by  zero  is  oo  ;  the  intersection  over  the  null  collection 
of  sets  is  universe  (the  universal  set). 

The  operators  =  and  £  are  valid  for  all  of  the  above;  oo  is  a 
valid  operand  for  all  numeric  operators;  universe  is  a  valid  operand  for 
all  operators  that  accept  sets  as  operands. 

Implementation  of  Values  of  Type  Undefined.  The  appearance  of  an 
undefined  value  is  usually  cause  for  alarm.  An  alarm  should  cause  the  system 
to  originate  a  warning  action  to  the  programmer,  but  beyond  that  we  make 
no  particular  recommendation  as  to  the  form  of  the  warning  or  the  means  of 


suppressing  it. 
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Justification  of  Values  of  Type  Undefined.  Undefined  values  can 
arise  in  a  variety  of  ways.  We  might  think  of,  for  instance,  the  value 
of  an  uninitialized  variable,  the  result  of  division  by  zero  and  the 
result  of  an  invalid  subscripting  operation.  We  propose  the  introduction 
of  a  type  undefined  and  a  collection  of  values  of  type  undefined  corres¬ 
ponding  to  (usually)  pathalogical  situations  such  as  those  described  above. 
For  some  we  may  wish  explicit  constants  in  the  language.  Thus  we  might 
write 

if  X  =  oo  then  . . . 

to  test  for  a  division  by  zero. 

The  introduction  of  a  type  undefined  provides  a  conceptually 
simple  mechanism  with  which  to  warn  the  programmer  of  some  of  the  wilder 
errors  as  well  as  providing  a  relatively  noncont rovers ial  system  reaction 
to  the  errors.  If  the  error  is  isolated,  the  system  may  proceed  with 
execution  of  the  program,  leaving  behind  an  indicative  trail  of  undefined 
values . 

Semantic  Description  of  Numbers .  A  value  of  type  number  will  be 
the  computer  representation  of  a  real  number.  We  have  two  reasons  for 
not  wishing  to  make  our  concept  of  a  number  precise. 

First,  the  only  reasonable  choice  for  numbers  in  a  given  implemen¬ 
tation  of  the  kernel  language  will  be  those  acceptable  to  the  floating 
point  hardware  of  the  machine.  For  that  implementation,  the  programmer's 
knowledge  about  values  of  type  number  will  be  a  pragmatic  mixture  of 
his  knowledge  of  numbers  in  the  abstract  and  his  study  of  the  machine 
specifications . 
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Second,  a  study  of  the  desirable  properties  of  computer  numbers  is 
well  beyond  the  scope  of  this  paper.  We  hope  to  see  some  results  in 
this  direction  in  a  study  presently  being  conducted  by  W.  Kahan,  J.  Welsch, 
and  N.  Wirth. 

We  do  find  it  useful  to  distinguish  three  subsets  of  the  class  of 
computer  numbers,  the  first  of  which  is  computer  integers.  The  second 
is  the  set  of  characters  which  is  the  set  of  integers  restricted  to  (3-2) 

the  range  0  to  255  inclusive.  Finally  we  have  the  logical  values 
0  and  1.  (3-3) 

Semantic  Description  of  Strings.  We  consider  a  fixed  input  or 
output  device.  We  assume  a  correspondence  between  the  printing  ch;  racter 
of  the  device  and  the  characters  (see  3-2).  Normally  some  of  the  charac¬ 
ters  are  unused  for  printable  characters  and  may  be  used  for  nonprinting 
or  control  functions.  A  string  in  the  kernel  language  is  an  ordered  set 
of  printable  characters  delimited  by  the  string  quote  (").  We  adopt  the 
PL/l  convention  that  within  the  string,  two  contiguous  string  quotes 
signify  a  single  string  quote  within  the  string. 

Justification  of  Strings .  The  programmer  communicates  with  his 
program  via  strings  of  characters;  thus  unrestricted  ability  to  analyze, 
manipulate  and  produce  character  strings  is  a  minimal  requirement  for 
any  computer  language.  In  much  the  same  spirit  that  a  compiler  for 
numerical  work  provides  certain  standard  functions  such  as  square  root 
and  natural  logarithm,  we  must  provide  primitive  string  manipulating 
functions . 

Implementation  of  Input  and  Output .  For  the  kernel  language 
we  assume  that  we  have  a  single  input  medium  and  a  single  output  medium. 
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If  we  view  the  program  over  the  history  of  its  execution,  the  input  and 
output  are  each  single  contiguous  strings  of  characters.  We  name  two 
special  variables  (IN  and  OUT)  and  access  them  in  the  normal  manner  with 
our  primitive  string  manipulating  functions.  The  fact  that  in  real  time, 
the  program  may  have  to  wait  before  an  access  to  IN  can  be  completed  does 
not  affect  the  program  logic.  On  the  other  hand,  the  program  must  have 
control  over  when,  in  real  time,  the  output  appears.  Thus  we  establish 
the  convention  that  whenever  a  carriage  return  is  catenated  onto  OUT,  the 
string  OUT  is  shortened  past  the  carriage  return  and  the  excised  characters 
appear  on  the  output  medium. 

Semantic  Description  of  the  Null  List .  The  construct  { }  represents 
the  null  set  of  values.  We  use  the  dummy  production  <begin>  for  technical 
reasons  having  to  do  with  the  emission  of  block  entry  code  from  the  canon¬ 
ical  parse. 

Semantic  Description  of  true  and  false.  The  constants  true  and 
false  are  synonymous  with  the  characters  1  and  0 . 

Semantic  Description  of  Variables.  A  variable  is  an  object  which 
can  be  named  in  the  kernel  language  and  to  which  any  value  can  be  assigned. 
The  designation  variable  is  given  by  either  an  identifier  or  a  subscripted 
identifier  (see  3-9) • 

Semantic  Description  of  Values  of  Type  Name.  Corresponding  to 
every  valid  name  in  the  kernel  language  is  a  value  of  type  name  within 
the  system.  Names  are  created  as  intermediate  results  and  are  not  access¬ 


ible  to  the  programmer. 


Implementation  of  Variables  and  Names.  A  variable  corresponds  to 
a  memory  address.  The  type  of  the  value  stored  in  a  variable  must  be 
preserved,  thus  we  find  that  we  allocate  two  words  for  a  variable  and  use 
the  second  to  store  the  type  information.  We  would  prefer  a  machine  in 
which  the  type  bits  were  automatically  associated  with  each  word  but  had 
special  properties.  In  particular  we  would  like  to  determine  whether  the 
variable  contains  a  value  of  type  address  to  effect  indirect  addressing 
but  without  accessing  the  whole  variable  to  find  out.  We  believe  this 
implies  that  at  least  some  of  the  type  bits  must  be  accessible  in  a 
fraction  of  the  time  to  access  a  memory  word. 

Syntax  of  Declarations . 

<declaratiori>  :  :=  declaration.^ 

<decle ration^  ::=  new  identifier! 
declaration^  identifier 

Examples  of  Declarations . 

new  abed  • 

new  thisone  thatone  anyone 

Semantic  Description  of  Declarations.  At  most  one  declaration  appears 
in  the  head  of  a  list  (see  3-5).  The  extent  of  the  list  defines  the  scope 
of  the  identifiers  in  the  declaration.  Every  identifier  in  a  program  must 
either  be  reserved  or  lie  within  the  scope  of  an  identifier  of  the  same 
name.  Upon  entry  into  the  scope  of  an  identifier,  the  system  allocates 
a  variable  to  it  and  gives  it  the  value  uninitialized.  An  identifier  names 
the  variable  allocated  to  it.  If  an  identifier  appears  in  more  than  one 
declaration,  the  use  of  that  identifier  names  the  variable  corresponding 
to  the  smallest  containing  scope. 
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Justification  of  Declarations.  The  use  of  declarations  to  define 


the  scope  of  variables  is  well  established.  With  dynamic  typing  of  values, 
we  find  no  particular  advantage  in  binding  the  type  of  an  identifier  with  a 
compile-time  declaration.  The  involuted  nature  of  the  kernel  language 
moves  the  structural  implications  of  the  conventional  declaration  out  into 
the  main  program.  Thus  we  find  that  declarations  in  the  kernel  language 
are  reduced  to  the  single  action  of  delineating  the  scope  of,  and  allocating 
variables  fo£  identifiers  used  in  the  program. 

We  regard  this  as  the  final  step  in  the  direction  taken  by  Wirth 
and  Weber  ([25]  p.  ^5). 

Implementation  of  Declarations .  From  the  viewpoint  of  variable 
addressing,  the  program  consists  of  a  nested  collection  of  scopes.  Thus 
from  any  point  in  the  program  we  may  assign  a  unique  ordered  pair  of 
integers  to  each  variable,  namely,  the  depth  of  nesting  of  the  scope  of 
the  identifier  and  the  position  of  the  identifier  in  the  declaration. 

We  call  the  integers  the  scope  level  and  order  number  respectively.  At 
compile  time  we  can  name  the  variables  with  the  scope  level  and  order 
number. 

The  form  of  the  declaration  suggests  that  we  should  allocate  a 
list  of  variables  corresponding  to  the  declared  identifiers  upon  entry 
into  the  scope  of  the  identifiers .  The  order  number  of  a  variable  is 
the  index  of  that  variable  in  the  list  of  local  variables.  Thus  we 
expect  to  use  the  scope  level  to  find  a  particular  list  and  the  order 
number  to  find  an  element  of  that  list.  At  execution  time  we  convert 
the  compile  time  name  into  a  value  of  type  name  by  locating  the  memory 
location  assigned  to  the  variable. 
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The  designers  of  programning  languages  have  traditionally  (3-4) 
indulged  themselves  in  a  semantic  ambiguity:  one  cannot  always  tell 
from  the  form  of  an  expression  (a  subscripted  variable  for  instance) 
whether  the  name,  or  the  value  stored  in  the  named  location,  is  indi¬ 
cated.  In  the  Algol  60  construct  of  general  call  by  name  the  ambiguity 
is  complete;  the  expression  must  yield  both  name  and  value,  the  choice 
depending  upon  its  use  at  a  remote  location.  One  can  remove  the  ambiguity 
by  introducing  explicit  name  and  value  operators  into  the  language 
([25]  p.  45).  Since  the  choice  is  always  ultimately  clear  from  the  con¬ 
text  in  which  the  expression  is  found,  we  have  chosen  to  dynamically 
defer  the  final  fetch  of  the  value  in  cases  where  there  is  doubt. 

Syntax  of  Lists. 

<list>  ::=  Clist  head>  } 

Clist  head>  Clist  head>  ,  <expression>  | 

<begin>  <declaration>  | 

<begin>  <expression> 

<begin>  : :=  ( 

Examples  of  Lists. 

(1,  2,  3,  "ABC") 

(x  «-  1,  y  «-  y-2,  if  x  <  y  then  z  else  z  <-  y  ) 

(new  abc,  a«-b«-1.0,  c  «-  5  } 

(new  a,  (new  a,  a  <-  ?),  a  «-  2  } 
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Semantic  Description  of  Lists .  A  list  is  an  ordered  set  of  (3-5) 
expressions  which  are  evaluated  sequentially.  The  value  of  the  evaluated 
list  is  the  ordered  set  of  evaluated  expressions  and  is  of  type  list. 

The  declaration  is  not  an  expression  and  does  not  contribute  to  the  value 
of  the  list. 

Justification  of  Lists.  Arrays,  trees,  iteration  lists,  parameter 
lists,  strings,  blocks  and  compound  statements  are  ordered  sets.  The 
inclusion  of  arbitrary  (even  infinite)  lists  in  the  kernel  language  to¬ 
gether  with  the  principle  of  involution  yields  a  drastic  reduction  in  the 
number  of  primitive  concepts. 

Semantic  Description  of  Values  of  Type  List .  A  value  of  type  list 
is  an  ordered  collection  of  values  with  any  admixture  of  the  value  types. 

Implementation  of  Lists.  We  discover  in  the  literature  two  alterna¬ 
tives  for  representing  lists.  The  first,  in  LISP,  demands  a  list  structure 
where  all  elements  are  explicitly  linked  in  storage.  In  Euler  and  Burroughs 
B5500  hardware  we  find  that  a  value  of  type  list  is  a  descriptor  which 
delineates  the  extent  and  locates  the  list.  The  list  elements  are  stored 
in  sequentially  contiguous  memory  locations.  The  first  comparison  is  in 
the  amount  of  storage  required  to  represent  a  given  list.  In  LISP  we 
must  use  memory  for  the  linking  information;  in  Euler  we  must  use  memory 
for  dynamic  typing.  We  estimate  that  implicit  linking  saves  a  factor  of 
two  in  r.c.nory.  The  second  comparison  is  in  ease  of  access.  In  LISP  we 
must  explicitly  trace  the  list  structure  to  find  an  element  near  the  end 
of  a  list;  in  Euler  we  may  access  any  element  of  any  list  directly  via  a 
subscript.  There  is  no  reason  to  expect  the  implicit  list  structure 
organization  to  be  less  efficient  than  conventional  index  registors  for 
array  applications  so  long  as  descriptors  do  not  have  to  be  repeatedly 
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fetched  from  memory.  Even  with  the  repeated  fetching,  the  B5500  is 
able  to  subsume  the  extra  core  accesses  under  cover  of  the  multiply 
operation  time  so  as  to  be  proportionately  as  fast  as  the  709^  for 
matrix  problems.  Our  third  comparison  is  in  ease  of  modification.  In 
LISP  we  must  change  a  link  to  append  or  insert  an  element  to  a  list  where 
in  Euler  we  must  copy  the  entire  contiguous  block.  Implicit  linking  is 
severely  less  efficient  here.  Fourth,  we  must  consider  storage  reclama¬ 
tion.  In  both  systems  the  majority  of  time  is  spent  in  searching  out 
and  identifying  the  valid  list  structure.  In  Euler  we  find  that  the 
percentage  of  execution  time  spent  in  storage  reclamation  is  roughly  tne 
same  as  the  percentage  of  storage  occupied  with  valid  list  structure;  we 
have  no  figures  on  LISP.  In  any  case  we  do  not  expect  the  systems  to  be 
much  different  in  this  respect. 

We  do  not  know  which  represents  the  most  efficient  solution;  we 
suspect  that  it  is  both  problem  dependent  and  hardware  dependent.  We 
have  chosen  implicit  linkings  so  as  to  have  array  capability  without  in¬ 
troducing  them  into  the  language  as  a  distinct  form. 

Semantic  Description  of  Values  of  Type  Set.  A  set  differs  from 
a  list  in  two  ways: 

(1)  A  set  cannot  contain  two  equal  values. 

(2)  The  programmer  cannot  prescribe  the  order  of  t,he  members  of  the 
set.  Certain  operations  are  allowed  on  sets  and  not  on  lists. 

Justification  of  Values  of  Type  Set.  The  set  operations  of  member¬ 
ship,  inclusion,  equivalence,  etc.,  reouire  preorganization  for  efficient 
implementation.  We  choose  to  sort  the  values  of  a  set  by  a  machine  deter¬ 
mined  order  to  facilitate  table  look  ups  (binary  searches),  union  and 
intersection  (merges),  etc.  The  membership  operation  (for  instance)  takes 
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log^n  operations  in  a  sorted  set  and  n/2  operations  in  an  unsorted 
set  (on  the  average). 

Syntax  of  Subscripts. 

[<expression>] 

Examples  of  Subscripts . 

[i]  [x-z]  [(1,2,3)] 

Semantic  Description  of  Subscripts .  We  will  distinguish  between 
the  subscript  expression  (the  expression  in  the  syntax  above),  the  sub¬ 
script  operator  (the  result  of  applying  certain  standard  transformations 
to  the  subscript  expression),  the  subscript  operand  (the  object  in  the 
kernel  language  to  which  the  subscript  operator  is  being  applied)  and 
the  subscripted  expression  (the  final  result  achieved  by  applying  the 
subscript  operator  to  the  subscript  operand).  A  subscript  expression 
has  meaning  if  (l)  it  has  type  number  or  (2)  it  has  type  list  and  all  its 
members  have  type  number.  A  subscripted  expression  has  meaning  if  (l)  the 
subscript  has  meaning  and  (2)  the  subscript  operand  is  one  of  the  structured 
types,  string,  list  or  set.  If  a  subscripted  expression  does  not  have  mean¬ 
ing,  it  yields  a  value  of  type  undefined. 

Subscripts  of  Type  Number.  If  the  value  of  the  subscript  expression 
is  of  type  number,  the  value  of  the  subscript  operator  is  the  nearest 
(rounded  up)  integer. 

Subscripts  of  Type  List.  If  the  value  of  the  subscript  expression 
is  of  type  list  and  each  element  of  the  list  is  of  type  number  then  the 
subscript  operator  is  the  list  of  nearest  integers  (rounded  up)  corres¬ 
ponding  to  the  numbers  in  the  subscript  expression. 
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Justification  of  Subscripts.  Various  constructs  in  the  kernel 
language  have  the  form  of  ordered  sets.  Numerical  subscripts  will  be 
used  to  select  elements  from  the  ordered  sets  and  list  valued  subscripts 
will  be  used  to  select  subsets  from  the  ordered  sets. 


Examples  of  Subscripted  Lists. 

list  subscript 

{10,  20,  30,  40} [l] 

(10,  20,  30,  40} [minus  1  ] 

(10,  20,  30,  40} [{2,4}] 

{10,  20,  30,  40} [1  to  3] 

{10,  {20,  {30},  40))[2] 

{10,  {20,  {30},  40} } [2][2] 


result 
=  10 

=  40 

=  (20,  40} 

=  {10,  20,  30} 

=  {20,  {30},  40} 
=  (30} 
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Syntax  of  the  Case  Expression. 

<case>  Cease  head>  } 

Cease  head>  :  Cease  head>  ,  Cexpression>  | 

Cease  begiri>  Cdeclaration>  | 
Cease  begin>  Cexpressiori> 
Cease  begin>  :  :=  [Cexpression>]  { 

Examples  of  the  Case  Expression. 

In]  {1,2,3,5,7,11,15,17,19} 

[ { x,  minus  1}]  {  new  a, 

a  «-  "Invalid  type  for  subscript  operator", 
a  «-  "Invalid  type  for  subscript  operand", 
a  «-  "Subscript  out  of  range", 
out  «-  a  ©  cr 


Semantic  Description  of  the  Case  Expression.  [l3l  The  case 


expression  has  the  form  of  an  explicit  list  preceded  by  a  subscript. 

Upon  execution  the  following  occurs:  (])  The  subscript  operator  is 
evaluated.  (2)  The  list  is  entered.  (3)  Storage  is  allocated  for 
the  local  variables  (if  any).  If  the  subscript  operator  is  an  integer 
then  we  have  (^)  If  the  value  of  the  integer  is  zero  or  larger  in  magni¬ 
tude  than  the  number  of  expressions  in  the  list,  a  value  of  type  undefined 
results.  If  the  subscript  operator  is  positive  then  it  is  used  as  an 
index  to  select  an  expression  counting  from  the  front  of  the  1 1st;  if 
it  is  negative  it  is  used  to  select  an  expression  counting  from  the  rear 
of  the  list.  (5)  The  selected  expression  is  evaluated  and  the  value  of 
the  case  expression  is  the  value  of  the  selected  expression.  If  the 
subscript  operator  is  of  type  list  then  (k)  Each  number  in  the  list  is 
used  sequentially  to  select  an  expression  as  done  above  for  subscript 
operators  of  type  number.  (5)  The  value  of  the  case  expression  is  the 
list  of  values  so  computed. 

Implementation  of  Case  Expressions.  The  use  of  an  index  to  select 
an  expression  out  of  a  list  of  expressions  suggests  that  the  machine  code 
itself  should  have  the  form  of  a  list  structure  where  the  code  for  an 
expression  occupies  exactly  one  memory  location. 

Justification  of  Case  Expressions.  The  case  expression  represents 
one  of  the  more  powerful  sequence  controlling  features  of  the  kernel 
language.  If  the  subscript  operator  is  a  number,  it  resembles  the  Algol 
60  switch  without  the  nuisance  of  labels.  The  list  valued  subscript 
operator  allows  reordering  and  repetition  of  the  expressions  in  a  list. 
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Syntax  of  Primaries. 


<primary>  ::=  <primary^> 

<primary^>  ::=  <primary.>  [<expression>]  |  <primary^> 

<primary^>  ::=  <constant>  |  (<expression>)  |  identifier  <list>  | 

identifier  |  <list>  |  <case> 

Examples  of  Primaries . 

3.0  (x-z)  X  X[2][a-2]  Y(l,2,3)  (1,2,3}  [n] (1,2,3} 

Semantic  Description  of  Primaries .  Parentheses  allow  the  programme 
to  reorder  the  evaluation  of  operators  in  the  conventional  manner.  They 
have  no  other  meaning  in  the  kernel  language. 

An  identifier  followed  by  a  list  signifies  a  procedure  activation. 
The  list  (of  parameters)  is  evaluated  and  the  name  of  the  variable  corres 
ponding  to  the  identifier  is  computed.  If  the  variable  contains  a  value 
of  type  process  the  process  is  activated,  otherwise  the  value  undefined 
is  returned.  (See  3-12).  (3-6) 

If  an  identifier  appears  alone,  the  name  of  the  variable  corres¬ 
ponding  to  the  identifier  is  first  computed.  If  that  variable  holds  a 
value  of  type  process,  the  process  is  activated  and  the  name  of  the 
identifier  is  replaced  with  the  value  of  the  process.  (See  3-11 ). 

Semantic  Description  of  Subscripted  Primaries.  If  the  (3-7) 

subscript  operand  has  type  name,  it  is  replaced  by  the  value  of  the  named 
variable.  The  effect  of  the  subscript  operand  on  types  string  and  list 
follows . 
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Semantic  Description  of  Subscripted  Strings .  A  value  of  type 
string  is  an  ordered  set  of  characters.  If  the  subscript  operand  is  of 
type  string  the  following  remarks  apply:  (l)  If  the  subscript  operator  (3 
is  an  integer  and  this  integer  is  positive  and  less  than  or  equal  to  the 
length  of  the  string,  the  value  of  the  subscripted  expression  is  the 
character  selected  by  counting  from  the  front  of  the  string;  if  the  integer 
is  negative  and  no  larger,  in  magnitude,  than  the  length  of  the  string, 
the  character  is  selected  by  counting  from  the  rear  of  the  string;  other¬ 
wise  the  value  undefined  is  returned.  (2)  If  the  subscript  operator  is 
a  list  of  integers  then  the  result  is  a  (sub)  string  selected  by  applying 
each  integer  as  a  subscript  operator  in  order  of  occurence. 

Implementation  of  Strings.  If  we  view  a  string  as  a  packed  read¬ 
only  data  structure  then  the  operation  of  forming  a  contiguous  substring 
can  be  accomplished  by  constructing  a  new  descriptor  to  point  into  the 
old  string.  An  implication  is  that  a  scanning  algorithm  does  not  have  to 
move  characters,  only  locate  them. 

Semantic  Description  of  Subscripted  Lists .  A  value  of  type  list 
is  an  ordered  set  of  values.  If  the  subscript  operand  is  of  type  list 
the  following  remarks  apply:  (l)  If  the  subscript  operator  is  an  (3-9) 
integer  then  the  value  returned  is  the  name  of  the  variable  selected 
according  to  the  algorithm  given  in  paragraph  (3-8).  (2)  If  the  sub¬ 

script  operator  is  a  list  of  integers  then  the  result  is  the  (sub)  list 
selected  by  applying  each  integer  as  a  subscript  operator  in  their  order 
of  occurence  in  the  subscript  operator. 


Syntax  of  Prefix  Operators . 


(3.10) 


i 

<prefix>  ::=  “~i  |  minus  |  type  |  abs  |  round  |  chop  |  length  |  set 

Semantic  Description  of  Prefix  Operators.  A  prefix  operator  is 
a  single  valued  partial  function  of  one  operand.  The  action  of  the  operator 
is  defined  when  the  function  is  given  over  the  allowed  range  of  the  operand. 

All  of  the  above  operators,  except  type,  length  and  set,  are  numeric  prefix 
operators.  Their  behavior  for  numeric  operand,  is  obvious;  their  behavior 
for  list  valued  operands  is  discussed  presently. 

Semantic  Description  of  the  Operator  "type" .  The  range  of 
operands  for  type  is  the  collection  of  all  values.  The  function  defined 
by  the  operator  gives  an  integer  corresponding  to  the  type  of  the  operand. 

We  leave  the  actual  integer  to  be  implementation  defined  since  it  is 
convenient  to  have  more  than  one  system  type  corresponding  to  a  given 
kernel  language  type.  Normally  we  test  for  type  with  a  construct  like 

if  (type  a)  =  (type  "  ")  then  ... 

rather  than  attempting  to  remember  the  correspond  'nee  between  integers 
and  types. 

Semantic  Description  of  the  Operator  "length" .  The  operand  of 
length  must  have  type  set,  list  or  string.  The  value  of  the  function 
defined  by  the  operator  is  the  number  of  elements  in  the  structured 
operand. 

An  application  of  the  operator  set  is  the  only  way  to  transform  • 

a  value  of  type  list  into  a  value  of  type  set.  The  resulting  value  will 
have  no  repeated  elements  and  will  have  been  reordered. 
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Syntax  of  Infix  Operators. 


<infix>  ::=  n  |u|o|c|d|€|/£|  index  |  ©  |  list  |  base  | 
vl  A|<!<|=|^|>|>l  max  |  min  |  +  |  -  | 

X  |  mod  |  |  t 

Semantic  Description  of  Infix  Operators .  An  infix  operator  is  a 
single  valued  partial  function  of  two  (right  and  left)  operands.  The 
action  of  the  operator  is  defined  when  the  function  is  given  over  the 
allowed  range  of  the  operands. 

Semantic  Description  of  0,  U,  e,  C,  and  D.  The  range  of  values 
for  both  operands  is  the  collection  of  all  values  of  type  set.  Their 
defining  functions  are,  respectively,  set  intersection,  union,  difference, 
inclusion  and  containment. 

Semantic  Description  of  €  and  /£  •  The  left  operand  ranges 
over  the  collection  of  all  values;  the  right  operand  must  be  of  type  set 
or  list.  The  value  of  the  function  defining  £  is  true  if  a  value 
equal  to  the  left  operand  is  found  in  the  right  operand.  The  function 
defining  is  the  complement  of  the  above. 

Justification  of  Set  Operators .  The  concept  of  a  set  is  a  natural 
data  type  for  many  algorithms.  Its  simplicity  makes  the  set  a  natural 
object  for  the  kernel  language. 

Implementation  of  Set  Operators .  The  elements  of  the  set  valued 
operands  of  the  above  operators  are  sorted  to  facilitate  the  construction 
of  efficient  algorithms  for  their  execution  (sort  -  merge,  binary  look  up, 
etc. ) 
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Semantic  Description  of  index.  Index  is  identical  to  £  except 
that  the  resulting  value  is  the  index  within  the  net  of  the  value,  if 
found,  and  of  type  undefined  otherwise. 

Semantic  Description  of  base.  The  operands  of  base  must  be  both 

integers.  The  result  is  a  value  of  type  string.  The  string  is  the  legible 

representation  of  the  left  operand  to  the  base  specified  by  the  right 
operand. 

Semantic  Description  of  list.  The  left  operand  of  list  must  be  a 

number  and  the  right  may  have  any  value.  The  left  operand  is  rounded  to 

the  nearest  integer  and  the  result  is  that  many  copies  of  the  right 
operand  (thus  a  value  of  type  list). 

Semantic  Description  of  ©.  The  range  of  operands  of  ©  is  the 
collection  of  values  for  which  the  types  of  the  operands  (left  and  right 
respectively)  are  string,  string;  set,  list;  set,  set;  list,  set;  list, 
list.  In  the  first  case  the  result  is  a  string  obtained  by  catenating 
the  right  operand  onto  the  tail  of  the  left  operand.  Otherwise  the  re¬ 
sult  is  a  list  containing  the  members  of  the  left  operand  followed  by 
the  members  of  the  right  operand. 

Semantic  Description  of  =  and  /.  The  operands  of  =  and  f 

may  range  over  all  values.  If  the  operands  do  not  have  the  same  type, 
they  are  unequal.  If  they  have  an  unstructured  type,  they  are  equal  if 
they  are  identical.  If  they  have  a  structured  type,  they  are  equal  if 
they  have  the  same  length  and  the  corresponding  elements  are  equal. 

Semantic  Description  of  Numeric  Infix  Operators.  All  of  the 
remaining  operators  are  numeric  infix  operators.  If  both  operands  are 
of  type  number,  the  function  defining  the  operators  is  usually  obvious. 
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We  make  the  following  comments.  The  operators  V  and  A  (logical  "or" 
and  logical  "and")  accept  as  operands  only  logical  values  (See  3-3).  The 
result  of  s  4-  t  is  the  (real  valued)  quotient.  If  we  wish  the  integer 
quotient,  we  write  chop(s  4  t).  s  mod  t  is  defined  to  be  the  function 
s  -  t  x  chop(s  4  t)  for  all  numbers. 

Syntax  of  Simple  Expressions . 

<simple  expr>  :  :=  <simple  expr^> 

<simple  expr^  ::=  <primary>  <infix>  <simple  expr^>  | 

<preflx>  <simple  expr^>  | 

<infix>  /  <simple  expr^>  |  <primary> 

Examples  of  Simple  Expressions . 

3-2-1  a  +  b  -  c  x  d  mod  e  4  f  t  g 

~"l  (minus  abs  round  chop  a)  =  (b  max  c  min  d) 

+  /  1  to  n  {1,2,3}  -  (2,3,4) 

Semantic  Description  of  Simple  Expressions.  From  the  grammar  above 
we  deduce  that  the  operand  of  a  prefix  operator  is  the  value  of  the 
(largest  possible)  simple  expression  to  its  right.  The  operands  of  an 
infix  operator  are  the  primary  to  its  left  and  the  (largest  possible) 
simple  expression  to  its  right.  We  further  deduce  that  all  operators 
(excepting  those  reordered  by  parentheses)  are  evaluated  right-to-left. 

Justification  of  Right  to  Left.  We  have  provided  a  fairly  extensive 
catalog  of  operators  in  the  kernel  language  while  leaving  room  for  further 
extensions.  With  so  many  operators  it  would  be  confusing  at  best  to 
assign  hierarchies  to  them.  In  search  for  a  simple  rule  ordering  the 
evaluations,  we  are  left  with  either  left-to-right  or  right-to-left 
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order  ( [  15 3  p.  8).  The  normal  (and  only  reasonable)  interpretation  of 
prefix  operators  demands  a  right-to-left  ordering  among  themselves.  We 
choose  the  same  order  for  infix  operators  as  a  concession  to  consistency. 

Semantic  Description  of  List  Valued  Operands  for  Numeric  Operators. 
If  a  numeric  prefix  operator  finds  a  list  as  operand,  we  will  follow 
Iverson  ([l5l  p.  13)  in  generalizing  the  operator  to  yield  the  list  of 
values  obtained  by  applying  the  prefix  operator  to  the  members  of  the 
operand  in  order.  If  a  numeric  infix  operator  finds  a  value  of  type  list 
and  a  value  of  type  number  as  operands,  the  result  is  the  list  obtained 
by  applying  the  operator  successively  between  the  number  and  elements  of 
the  list.  If  the  operator  finds  two  lists  as  operands,  the  result  is  the 
list  obtained  by  applying  the  operator  between  corresponding  members  of 
the  lists.  The  operation  terminates  on  the  shorter  of  the  two  lists. 

More  formally,  let  s  and  t  be  numbers  and  S  and  T  be  lists. 
Then  if  0  is  a  numeric  prefix  operator,  the  following  are  equivalent: 
(See  3-13). 


0  S 

for  all 

V 

from 

S 

do  ©v 

is  a 

numeric  infix 

operator 

then  the  following  are 

equivalent 

s  0  T 

for  all 

V 

from 

T 

do  s  ©  v 

sot 

for  all 

V 

from 

S 

do  v  ©  t 

S  0  T 

for  all 

i 

from 

1 

to  (length  S)  min 

(length  T) 

do  S[ i ] 

0  T[i] 

• 
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Semantic  Description  of  Compression.  If 
then  the  following  are  equivalent: 

0  /  T  for  all  v  from  T  do  u  «-  u 

The  latter  depends  upon  the  initial  value  of  u 
the  following: 


U,  (  )  ; 

fl,  universe  ; 

©,  ft  ; 

base,  ft  ; 

V  >  0  ; 

A  >  1  ; 

<>  <>  =>  h  >,  >, 

max  f  -  oo  ; 
min  ,  oo  ; 

+  ,  0  ; 

0; 
x  ,  1  ; 
mod  ,  1  ; 

v  ,  l  ; 

t  ,  ft  . 


0  is  any  infix  operator 

0  v 

for  which  we  specify 


all  undefined  ; 
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Justification  of  Compression.  Compression,  as  well  as  the  other 
generalizations  of  the  numeric  operators  in  the  paragraphs  preceding, 
is  a  concise  way  of  expressing  common  programming  tasks.  Furthermore, 
as  pointed  out  by  R.  S.  Barton,  they  provide  a  mnemonic  notation  for 
ignoring  the  order  of  execution  so  that,  if  parallelism  is  available, 
it  can  b„  utilized.  For  example,  the  inner  product: 

+  /  u  x  v 

of  vectors  of  length  n  can  be  performed  in  log2n  +  2  operation  times 
if  n  multipliers  and  n  t  2  adders  are  available. 

Syntax  of  Step  Lists. 


<step  list>  ::=  <simple  expr> 

to 

< simple  expr> 

i 

<simple  expr> 

b£ 

< simple  expr> 

1 

<simple  expr> 

by 

<simple  expr> 

to 

<simple 

expr> 

<simple  expr> 

to 

< simple  expr> 

by 

<  simple 

expr> 

<simple  expr> 


Examples  of  Step  Lists . 


2  by  minus  2  to  minus  l6  1  to  n 

x-z  to  X[n]  by  2  1  by  1 

Semantic  Description  of  Step  Lists.  A  step  list  is  a  list  of  values 
of  type  number.  The  value  of  the  first  expression  above  is  called  the 
initial  value ;  the  value  following  the  t_o  is  called  the  limiting  value ; 
the  value  following  the  b£  is  called  the  step  value .  The  evaluation  of 
the  step  list  proceeds  as  follows: 

(1)  All  the  expressions  are  evaluated  in  the  order  of  their 
occurence  in  the  program. 

(2)  If  the  step  value  is  missing  it  is  replaced  by  1. 

(3)  If  the  limiting  value  is  missing,  it  is  replaced  by  a  value 
of  type  undefined. 

(4)  If  all  the  values  thus  computed  are  of  type  number,  the  step 
list  has  for  value  all  the  numbers  of  the  form  (initial  value)  +  (n)  x 
(step  value)  lying  between  (inclusive)  the  initial  and  limiting  values 
where  n  ranges  over  the  integers  from  0  to  infinity.  If  the  limiting 
value  is  undefined  the  set  is  infinite,  otherwise,  it  is  undefined. 


Syntax  of  Assignments. 

<expressionj>  <primary^>  <-  <expression^> 


Examples  of  Assignments . 

a  «-  1,  (if  x  =  y  then  z[l]  else  z[2])  «-  7  , 

b  «- 1  to  n,  c[2] [x-z]  <-  "ave."  , 

c  <-  {  new  x,  if  (length  b)  =  3  then  out  *-  "3"  , 

out  <-  out  ©  cr, 
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X  out  } 


Semantic  Description  of  Assignments.  The  primary  on  the  left 
must  have  a  value  of  type  name.  If  it  does,  the  value  of  the  named 
variable  is  replaced  by  the  value  of  the  expression  on  the  right.  The 
value  of  the  expression  is  also  the  value  of  the  assignment. 

Justification  of  Assignments.  The  assignment  allows  the  saving  of 
temporary  intermediate  values.  We  also  provide  some  flexibility  in  the 
designated  variable  on  the  left  of  the  arrow  (i.e.,  subscripted  or 
unsubscripted  identifiers  and  the  parenthesized  expressions).  Both  of 

(if  a  =  b  then  c  else  d)  «-  3, 
a  <-  (p)  c,  a  «-  3, 


are  meaningful,  and,  if  a  initially  equals  b,  have  the  same  effect. 
In  the  first  case  the  principle  of  deferment  demands  delaying  the  fetch 
of  the  value  of  c  until  the  end  of  the  conditional  expression  at  which 
point  we  discover  that  it  is  the  name  that  we  want.  In  the  second,  we 
delay  until  after  return  from  the  procedure.  The  latter  case  is  exactly 
the  Algol  60  call-by-name  construct. 

Syntax  of  Procedures. 


<expression^> 

<procedure> 

<primary^> 


-  <procedure>  <expressionj> 

-  © 

=  identifier  <list>  |  identifier 


Examples  of  Procedures. 

increase  «-  ©  a  «-  a  +  3.  , 
increase  <-  {new  a,  a  «-  a  +  1}  , 

increase(^)  a)  , 
factorial  «- 
{new  n, 

if  n  =  0  then  1  else  n  x  factorial{n-ll 

)[ll 


\ 


* 
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Semantic  Description  of  Procedure  Dei inition.  A  procedure 
definition  is  denoted  by  the  mark  followed  by  an  expression  called 

the  procedure  expression.  The  execution  of  a  procedure  definition  pro¬ 
duces  a  value  of  type  process.  If  the  procedure  expression  is  not  an 
explicit  list  (or  an  explicit  list  followed  by  subscripts)  then  it  is 
called  a  parameterless  procedure . 

Semantic  Description  of  ]  rame ter less  Procedure  Activation.  (3-ll) 
Whenever  the  name  of  a  variable  is  computed,  that  variable  is  inspected 
to  determine  whether  or  not  it  contains  a  value  of  type  process.  If  it 
does,  the  process  is  activated  and  the  name  of  the  variable  is  replaced 
with  the  resulting  value,  If  that  value  is  again  of  type  name,  the  test 
is  repeated,  etc.,  until  a  value  of  some  other  type  is  returned.  If,  at 
the  time  of  procedure  activation,  all  of  the  variables  valid  at  the  place 
of  procedure  definition  are  defined,  then  the  effect,  and  the  resulting 
value  are  the  same  as  would  be  obtained  by  executing  the  procedure  ex¬ 
pression  in  the  same  environment  at  the  place  of  definition. 

Semantic  Description  of  a  Procedure  with  Parameters.  (3-12) 

If  the  procedure  expression  is  an  explicit  list,  then  it  has  a  (perhaps 
null)  list  of  identifiers  local  to  the  scope  defined  by  the  list.  We 
call  the  variables  allocated  to  these  identifiers  the  first,  second, 
third,  etc.,  initialize!  variables  of  that  procedure. 

Semantic  Description  of  the  Activation  of  a  Procedure  with 
Parameters.  If  the  procedure  activation  is  signified  by  an  identifier 
followed  by  an  explicit  list,  we  call  the  list  the  actual  parameter  list. 

If  the  variable  allocated  to  the  identifier  does  not  contain  a  value  of 
type  process,  a  value  of  type  undefined  is  returned.  Otherwise  the 
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activation  is  identical  to  that  for  the  parameterless  procedure^  except 
that  the  initial! sable  variables  of  the  procedure  are  given  the  values  of 
the  corresponding  actual  parameters. 

tt 

.justification  of  Values  of  Type  Process.  The  ability  to  define 
a  process  that  can  be  activated  upon  demand  is  present  in  some  form  in 
Algol  60  procedures,  functions,  switches  and  name  parameters.  We  have 
in  the  Kernel  language  a  single  process  defining  construct.  The  value 
of  a  process  may  be  of  any  type  and  the  value  may  depend  upon  where  the 
process  is  activated.  (For  instance,  if  a  process  is  activated  to  the 
left  of  a  replacement  arrow  it  may  return  a  value  of  type  name  rather 
than  the  actual  value  of  the  named  variable,)  Process  recursion,  the 
programming  analogue  of  mathematical  induction,  is  frequently  the  most 
natural  way  of  expressing  an  algorithm  in  the  kernel  language. 

The  second  and  tnird  examples  above  show  tne  kernel  language 
equivalent  of  Algol  60  name  parameters.  The  local  variable  a  is 
initialised  to  the  pioeedure  to  compute  a,  a  non  local  variable. 

Each  occurence  of  the  identifier  a  in  tne  list  body  causes  the  procedure 


\Tote  tnat  since  every  access  to  the  procedure  identifier  causes  a 
procedure  activation,  there  is  no  equivalent  to  the  Algol  60  procedure 
assignment  statement.  If  the  procedure  has  parameters  it  is  necessarily 
list -valued  unless,  as  in  tne  factorial  example,  a  subscript  is  used  to 
select  the  desired  value, 

^  Values  of  type  process  are  similiar  to  the  quotations  of  Euler. 

Tne  difference  is  that  Euler  quotations  behaved  differently  when  passed 
as  parameters  and  when  stored  in  local  variables.  We  have  eliminated 
the  distinction. 


116 


— 


to  be  activated.  The  first  activation  yields  the  name  of  the  non  local 
a  since  it  is  called  to  the  left  of  the  assignment  arrow;  the  second 
yields  the  value.  The  result  is  that  the  non  local  a  is  increased  by  1 
Implementation  of  Primaries  of  Type  Process.  The  necessity  of 
accessing  a  word  to  compute  its  address  is  a  consequence  of  the  general- 
call-by-name  concept  from  Algol  60.  The  provision  for  a  special  fast- 
access  bit  associated  with  the  word  is  required  for  efficient  implementation^ 


Syntax  of  While-Controlled  Iterations . 
<expression,>  ::=  <while  clause>  do  <expression^> 
<while  clause>  <while>  <step  list> 

<whila>  while 


Examples  of  While-Controlled  Iterations. • 

while  in[lj  ^  "  "  [l]  do  (a  ♦-  a  ®  in[  {l) ] , 

in  <-  in[2  to  length  in] 

) 

while  xt2^a  do  x«-(x  +  aTX)-f2 


Semantic  Description  of  While-Controlled  Iterations.  A  while- 
controlled  iteration  consists  of  a  while  clause  and  a  controlled  expression. 


^The  Burroughs  B5500  has  the  special  bit  (called  the  flag  bit)  but  it 
can  be  examined  by  the  hardware  only  by  accessing  the  word.  Thus  even 
in  the  assignment  j 

a  «-  a 

three  memory  references  are  required. 
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The  while  clause  is  evaluated;  if  it  has  value  true  then  the  controlled 
expression  is  evaluated  and  we  return  to  re-evaluate  the  while  clause; 
if  it  has  value  false  we  terminate  the  iteration  and  the  value  of  the 
while-controlled  iteration  is  the  list  of  values  of  the  controlled  ex¬ 
pression;  if  it  has  any  other  value,  the  iteration  is  terminated  with  a 
value  of  type  undefined. 

Syntax  of  For-Controlled  Iterations . 


<expression^> 

<for  clause> 
<while  clause> 
<while> 


<for  clause>  do  <expressionj>  | 

<for  claus  e>  <while  claus e>  d£  <expression^> 
for  all  identifier  from  <step  list> 

<while>  <step  list> 
while 


Examples  of  For- Controlled  Iterations . 

for  all  I  from  lton  do  S  <-S  +  I  t  3, 

+  /  for  all  I  from  1  to  n  do  I  t  3, 
for  all  t  from  table  while  looking  do 

if  t  •=  object  then  looking  <-  false  else  emit{0) 


* 
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Semantic  Description  of  For- Controlled  Iterations.  (3-13) 

The  for-controlled  iteration  provides  for  the  execution  of  the  controlled 
expression  of  a  fixed  number  of  times  or  a  fixed  number  of  times  with 
the  possibility  of  an  early  termination.  The  step  list  of  the  for  clause 
is  evaluated  once;  if  it  is  not  list  or  set  valued,  the  value  of  the 
for- controlled  expression  is  of  type  undefined.  The  scope  of  the  iden¬ 
tifier  of  the  for  clause  is  the  controlled  expression.  The  variable 
allocated  to  the  identifier  assumes  in  order  each  value  from  the  iteration 
set  and  the  controlled  expression  is  executed.  If  there  is  a  while  clause 
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and  its  value  is  not  true  before  the  execution  of  the  controlled  expres¬ 
sion,  the  iteration  is  terminated. 

The  value  of  the  for-controlled  expression  is  the  list  of  values 
assumed  by  the  controlled  expression. 

Syntax  of  Conditional  Expressions. 

<expression> 

<expression^> 

Expressions 
Expression^ 

<if  clause> 

<truepart> 

Examples  of  Conditional  Expressions. 

if  x  =  y  then  if  y  ^  z  then  x  <-  y  max  z  , 
if  test{7)  then  [x(-l,  y  <-  2)  else  x  «-  3  , 
if  if  A  C  B  then  true  else  z  fc  B  then  B  «-  (1 

Semantic  Description  of  Conditional  Expressions.  The  first 
form  of  conditional  expression  is  an  if  clause  followed  by  an  expression. 
The  if  clause  is  evaluated;  if  it  has  value  true  the  expression  is 
evaluated  and  the  value  obtained  is  the  value  of  the  conditional  expres¬ 
sion;  otherwise  the  value  of  the  conditional  expression  is  of  type 
undefined. 

For  the  second  form  we  evaluate  the  conditional  expression;  if  it 
is  true  we  evaluate  the  truepart  expression;  if  it  is  false  we  evaluate 
the  final  expression;  otherwise  we  create  a  value  of  type  undefined. 


Expression^1 

<if  clause>  Expression^1  |  Expression^ 
Expression^ 

<if  clause>  <truepar£>  Expression^1 
if  <expression>  then 
Expression^  else 


Syntax  of  Programs . 


<program>  ::=  h  <expression>  H 

Semantic  Description  of  a  Program.  The  value  of  a  program  is  the 
value  of  the  expression.  Note  that  by  the  nature  of  the  kernel  language 
(identification  of  Algol  60  blocks  and  values  of  type  list)  the  value  of 
a  program  will  be  a  list  structure  of  the  intermediate  results. 

Implementation  of  a  Program.  On  account  of  the  copious  list 
structure  generated  by  a  program,  we  must  have  some  form  of  remote  storage 
and  recall  mechanism.  The  list  structure  of  the  program  is  well  suited 
for  segmentation  and  overlay. 
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In  language  design,  we  attempt  to  carry  the  EULER  develop¬ 
ment  by  Wlrth  and  Weber  to  a  more  concise  and  powerful  form.  We 
advocate  languages  that  are  minimal  and  Involuted.  A  minimal 
language  combines  Into  a  single  construct  any  two  conceptually 
similar  but  notatlonally  different  constructs.  An  involuted 
language  avoids  constructs  that  are  .applicable  only  In  local 
context.  In  the  resulting  language  we  find  such  previously 
diverse  constructs  as  lists,  parameter  lists,  blocks,  compound 
statements,  for  lists,  and  arrays  to  be  Identical.  -After  com¬ 
bining  the  features  of  the  reduced  EULER  with  some  ideas  from 
Iverson  and  PL/I  we  find  that  our  control  over  the  flow  of  ex¬ 
ecution  within  a  program  is  sufficiently  complete  such  that  we 
can  discard  the  traditional  label  and  go- to  statement  as 
irrelevant . 


As  a  final  example  of  the  kernel  language,  we  present  an 
extendable  compiler  written  in  the  kernel  language  itself. 

Our  conclusions  are  that  the  precedence  grammar  techniques 
are  quite  efficient  and  useful.  Further  Improvement  could  make 
them  substantially  superior  to  other  methods  of  compiler  gen¬ 
eration.  We  believe  that  the  computing  community  would  be  better 
served  with  a  minimal  common  language  which  the  user  would 
routinely  extend  than  by  any  large  general  purpose  language. 
Finally  we  believe  that  the  growing  agreement  on  the  constructs 
common  to  all  programming  task  should  have  a  much  more  significant 
effect  upon  machine  design  than  is  presently  the  case. 


